This commit makes the following changes:
- Partially reverts the changes of feat: update workspace deadline when workspace ttl updated #2165, making the deadline of a running workspace build independant of TTL, once started.
- CLI: updating a workspace TTL no longer updates the deadline of the workspace.
- UI: updating a workspace TTL no longer updates the deadline of the workspace.
- Drive-by: API: When creating a workspace, default TTL to min(12 hours, template max_ttl) if not instructed otherwise.
- Drive-by: CLI: list: measure workspace extension correctly (+X in last column) from the time the provisioner job was completed
- Drive-by: WorkspaceSchedule: show timezone of schedule if it is set, defaulting to dayjs guess otherwise.
- Drive-by: WorkspaceScheduleForm: fixed an issue where deleting the "TTL" value in the form would show the text "Your workspace will shut down a few seconds after start".
* feat: add support for template in workspace filter
* feat: Implement workspace search filter to support names
* Use new query param parser for pagination fields
* Remove excessive calls, use filters on a single query
Co-authored-by: Garrett <garrett@coder.com>
This commit adds the following changes to workspace scheduling behaviour:
* CLI: updating a workspace TTL updates the deadline of the workspace.
* If the TTL is being un-set, the workspace deadline is set to zero.
* If the TTL is being set, the workspace deadline is updated to be the last updated time of the workspace build plus the requested TTL. Additionally, the user is prompted to confirm interactively (can be bypassed with -y).
* UI: updating the workspace schedule behaves similarly to the CLI, showing a message to the user if the updated TTL/time to shutdown would effect changes to the lifetime of the running workspace.
* fix: Update routing for workspace schedule
This was broken as part of #2101. It was a silly mistake,
but unfortunate our tests didn't catch it.
This is a rare change so unlikely to occur again, so I won't
make an issue adding tests.
* Update site/src/pages/WorkspaceSchedulePage/WorkspaceSchedulePage.tsx
Co-authored-by: Presley Pizzo <1290996+presleyp@users.noreply.github.com>
Co-authored-by: Presley Pizzo <1290996+presleyp@users.noreply.github.com>
This PR adds fields to templates that constrain values for workspaces derived from that template.
- Autostop: Adds a field max_ttl on the template which limits the maximum value of ttl on all workspaces derived from that template. Defaulting to 168 hours, enforced on edits to workspace metadata. New workspaces will default to the templates's `max_ttl` if not specified.
- Autostart: Adds a field min_autostart_duration which limits the minimum duration between successive autostarts of a template, measured from a single reference time. Defaulting to 1 hour, enforced on edits to workspace metadata.
* fix: Remove unused workspace routes in favor of list with filter
This consolidates the workspace routes into a single place.
It allows users to fetch a workspace by their username and
workspace name, which will be used by the frontend for routing.
* Fix RBAC
* Fix CLI usages
* Changes all public-facing codersdk types to use a plain int64 (milliseconds) instead of time.Duration.
* Makes autostart_schedule a *string as it may not be present.
* Adds a utils/ptr package with some useful methods.
* Adds a `bump` command to extend workspace build deadline
* Reduces WARN-level logging spam from autobuild executor
* Modifies `cli/ssh` notifications to read from workspace build deadline and to notify relative time instead (sidestepping the problem of figuring out a user's timezone across multiple OSes)
* Shows workspace extension time in `coder list` output e.g.
```
WORKSPACE TEMPLATE STATUS LAST BUILT OUTDATED AUTOSTART TTL
developer/test1 docker Running 4m false 0 9 * * MON-FRI 15m (+5m)
```
* Adds deadline column to workspace_builds, associated DB/API plumbing
* database: Upon inserting a row into workspace_builds, deadline will
initially be zero.
* autobuild: Executor now checks the Deadline field of the workspace_build
for the purpose of autostop logic.
* coderd: Adds a new route /api/v2/workspaces/:workspace/extend which allows
updating the deadline of the currently active workspace build. The new
deadline must be after the existing deadline, and not the zero time.
* provisionerd: updates workspace_build.deadline upon successful workspace
build completion (equal to now plus workspace TTL, if it exists).
Abstracting coderd into an interface added misdirection because
the interface was never intended to be fulfilled outside of a single
implementation.
This lifts the abstraction, and attaches all handlers to a root struct
named `*coderd.API`.
* database: add autostart_schedule and ttl to InsertWorkspace; make gen
* coderd: workspaces: consume additional fields of CreateWorkspaceRequest
* cli: update: add support for TTL and autostart_schedule
* cli: create: add unit tests
* coder: import `time/tzdata` for embedded timezone database
* autobuild: fix unit test that only runs with a real db
This PR adds a package lifecycle and an Executor implementation that attempts to schedule a build of workspaces with autostart configured.
- lifecycle.Executor takes a chan time.Time in its constructor (e.g. time.Tick(time.Minute))
- Whenever a value is received from this channel, it executes one iteration of looping through the workspaces and triggering lifecycle operations.
- When the context passed to the executor is Done, it exits.
- Only workspaces that meet the following criteria will have a lifecycle operation applied to them:
- Workspace has a valid and non-empty autostart or autostop schedule (either)
- Workspace's last build was successful
- The following transitions will be applied depending on the current workspace state:
- If the workspace is currently running, it will be stopped.
- If the workspace is currently stopped, it will be started.
- Otherwise, nothing will be done.
- Workspace builds will be created with the same parameters and template version as the last successful build (for example, template version)
It's possible for a workspace to become in an invalid state.
This is something we'll detect for jobs, and allow monitoring of.
These commands will allow admins to manually reconcile state.
This removes split ownership for workspaces. They are now
a resource of organizations and have a designated owner,
which is a user.
This enables simple administration for commands like:
- `coder stop ben/dev`
- `coder build logs colin/arch`
or if we decide to allow administrators to access workspaces,
they could even SSH using this syntax: `coder ssh colin/dev`.
Customer feedback indicated projects was a confusing name.
After querying the team internally, it seemed unanimous
that it is indeed a confusing name.
Here's for a lil less confusion @ashmeer7 🥂
* chore: Move httpmw to /coderd directory
httpmw is specific to coderd and should be scoped under coderd
* chore: Move httpapi to /coderd directory
httpapi is specific to coderd and should be scoped under coderd
* chore: Move database to /coderd directory
database is specific to coderd and should be scoped under coderd
* chore: Update codecov & gitattributes for generated files
* chore: Update Makefile
* chore: Rename ProjectHistory to ProjectVersion
Version more accurately represents version storage. This
forks from the WorkspaceHistory name, but I think it's
easier to understand Workspace history.
* Rename files
* Standardize tests a bit more
* Remove Server struct from coderdtest
* Improve test coverage for workspace history
* Fix linting errors
* Fix coderd test leak
* Fix coderd test leak
* Improve workspace history logs
* Standardize test structure for codersdk
* Fix linting errors
* Fix WebSocket compression
* Update coderd/workspaces.go
Co-authored-by: Bryan <bryan@coder.com>
* Add test for listing project parameters
* Cache npm dependencies with setup node
* Remove windows npm cache key
Co-authored-by: Bryan <bryan@coder.com>
* feat: Add history middleware parameters
These will be used for streaming logs, checking status,
and other operations related to workspace and project
history.
* refactor: Move all HTTP routes to top-level struct
Nesting all structs behind their respective structures
is leaky, and promotes naming conflicts between handlers.
Our HTTP routes cannot have conflicts, so neither should
function naming.
* Add provisioner daemon routes
* Add periodic updates
* Skip pubsub if short
* Return jobs with WorkspaceHistory
* Add endpoints for extracting singular history
* The full end-to-end operation works
* fix: Disable compression for websocket dRPC transport (#145)
There is a race condition in the interop between the websocket and `dRPC`: https://github.com/coder/coder/runs/5038545709?check_suite_focus=true#step:7:117 - it seems both the websocket and dRPC feel like they own the `byte[]` being sent between them. This can lead to data races, in which both `dRPC` and the websocket are writing.
This is just tracking some experimentation to fix that race condition
## Run results: ##
- Run 1: peer test failure
- Run 2: peer test failure
- Run 3: `TestWorkspaceHistory/CreateHistory` - https://github.com/coder/coder/runs/5040858460?check_suite_focus=true#step:8:45
```
status code 412: The provided project history is running. Wait for it to complete importing!`
```
- Run 4: `TestWorkspaceHistory/CreateHistory` - https://github.com/coder/coder/runs/5040957999?check_suite_focus=true#step:7:176
```
workspacehistory_test.go:122:
Error Trace: workspacehistory_test.go:122
Error: Condition never satisfied
Test: TestWorkspaceHistory/CreateHistory
```
- Run 5: peer failure
- Run 6: Pass ✅
- Run 7: Peer failure
## Open Questions: ##
### Is `dRPC` or `websocket` at fault for the data race?
It looks like this condition is specifically happening when `dRPC` decides to [`SendError`]). This constructs a new byte payload from [`MarshalError`](f6e369438f/drpcwire/error.go (L15)) - so `dRPC` has created this buffer and owns it.
From `dRPC`'s perspective, the callstack looks like this:
- [`sendPacket`](f6e369438f/drpcstream/stream.go (L253))
- [`writeFrame`](f6e369438f/drpcwire/writer.go (L65))
- [`AppendFrame`](f6e369438f/drpcwire/packet.go (L128))
- with finally the data race happening here:
```go
// AppendFrame appends a marshaled form of the frame to the provided buffer.
func AppendFrame(buf []byte, fr Frame) []byte {
...
out := buf
out = append(out, control). // <---------
```
This should be fine, since `dPRC` create this buffer, and is taking the byte buffer constructed from `MarshalError` and tacking a bunch of headers on it to create a proper frame.
Once `dRPC` is done writing, it _hangs onto the buffer and resets it here__: f6e369438f/drpcwire/writer.go (L73)
However... the websocket implementation, once it gets the buffer, it runs a `statelessDeflate` [here](8dee580a7f/write.go (L180)), which compresses the buffer on the fly. This functionality actually [mutates the buffer in place](a1a9cfc821/flate/stateless.go (L94)), which is where get our race.
In the case where the `byte[]` aren't being manipulated anywhere else, this compress-in-place operation would be safe, and that's probably the case for most over-the-wire usages. In this case, though, where we're plumbing `dRPC` -> websocket, they both are manipulating it (`dRPC` is reusing the buffer for the next `write`, and `websocket` is compressing on the fly).
### Why does cloning on `Read` fail?
Get a bunch of errors like:
```
2022/02/02 19:26:10 [WARN] yamux: frame for missing stream: Vsn:0 Type:0 Flags:0 StreamID:0 Length:0
2022/02/02 19:26:25 [ERR] yamux: Failed to read header: unexpected EOF
2022/02/02 19:26:25 [ERR] yamux: Failed to read header: unexpected EOF
2022/02/02 19:26:25 [WARN] yamux: frame for missing stream: Vsn:0 Type:0 Flags:0 StreamID:0 Length:0
```
# UPDATE:
We decided we could disable websocket compression, which would avoid the race because the in-place `deflate` operaton would no longer be run. Trying that out now:
- Run 1: ✅
- Run 2: https://github.com/coder/coder/runs/5042645522?check_suite_focus=true#step:8:338
- Run 3: ✅
- Run 4: https://github.com/coder/coder/runs/5042988758?check_suite_focus=true#step:7:168
- Run 5: ✅
* fix: Remove race condition with acquiredJobDone channel (#148)
Found another data race while running the tests: https://github.com/coder/coder/runs/5044320845?check_suite_focus=true#step:7:83
__Issue:__ There is a race in the p.acquiredJobDone chan - in particular, there can be a case where we're waiting on the channel to finish (in close) with <-p.acquiredJobDone, but in parallel, an acquireJob could've been started, which would create a new channel for p.acquiredJobDone. There is a similar race in `close(..)`ing the channel, which also came up in test runs.
__Fix:__ Instead of recreating the channel everytime, we can use `sync.WaitGroup` to accomplish the same functionality - a semaphore to make close wait for the current job to wrap up.
* fix: Bump up workspace history timeout (#149)
This is an attempted fix for failures like: https://github.com/coder/coder/runs/5043435263?check_suite_focus=true#step:7:32
Looking at the timing of the test:
```
t.go:56: 2022-02-02 21:33:21.964 [DEBUG] (terraform-provisioner) <provision.go:139> ran apply
t.go:56: 2022-02-02 21:33:21.991 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.050 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.090 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.140 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.195 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.240 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
workspacehistory_test.go:122:
Error Trace: workspacehistory_test.go:122
Error: Condition never satisfied
Test: TestWorkspaceHistory/CreateHistory
```
It appears that the `terraform apply` job had just finished - with less than a second to spare until our `require.Eventually` completes - but there's still work to be done (ie, collecting the state files). So my suspicion is that terraform might, in some cases, exceed our 5s timeout.
Note that in the setup for this test - there is a similar project history wait that waits for 15s, so I borrowed that here.
In the future - we can look at potentially using a simple echo provider to exercise this in the unit test, in a way that is more reliable in terms of timing. I'll log an issue to track that.
Co-authored-by: Bryan <bryan@coder.com>