Fixes a flake seen in
https://github.com/coder/coder/actions/runs/12346801529/job/34452940351
It's possible but exceedingly rare for the randomly generated username
to be exactly 32 characters.
Then, appending a `1` to that username causes the username to be invalid
and the test to fail. Instead of appending we'll just generate a new
username that is <=32 characters.
The `UpdateSelf` subtest has the same appending, but uses a fixed
username that is less than 32 characters, so it doesn't need to be
changed.
When Coder is ran in High Availability mode, each Coder instance has a
lifecycle executor. These lifecycle executors are all trying to do the
same work, and whilst transactions saves us from this causing an issue,
we are still doing extra work that could be prevented.
This PR adds a `TryAcquireLock` call for each attempted workspace
transition, meaning two Coder instances shouldn't duplicate effort.
When calculating the queue position in
`GetProvisionerJobsByIDsWithQueuePosition` we only counted jobs with
`started_at = NULL`. This is misleading, as it allows canceling or
canceled jobs to take up rows in the computed queue position, giving an
impression that the queue is larger than it really is.
This modifies the query to also exclude jobs with a null `canceled_at`,
`completed_at`, or `error` field for the purposes of calculating the
queue position, and also adds a test to validate this behaviour.
(Note: due to the behaviour of `dbgen.ProvisionerJob` with `dbmem` I had
to use other proxy methods to validate the corresponding dbmem
implementation.)
---------
Co-authored-by: Mathias Fredriksson <mafredri@gmail.com>
Relates to https://github.com/coder/coder/issues/15390
Currently when a user creates a workspace, their workspace's TTL is
determined by the template's default TTL. If the Coder instance is AGPL,
or if the template has disallowed the user from configuring autostop,
then it is not possible to change the workspace's TTL after creation.
Any changes to the template's default TTL only takes effect on _new_
workspaces.
This PR modifies the behaviour slightly so that on AGPL Coder, or on
enterprise when a template does not allow user's to configure their
workspace's TTL, updating the template's default TTL will also update
any workspace's TTL to match this value.
https://github.com/coder/coder/pull/15608 introduced a buggy behaviour
with dbcrypt enabled.
When clearing an oauth refresh token, we had been setting the value to
the empty string.
The database encryption package considers decrypting an empty string to
be an error, as an empty encrypted string value will still have a nonce
associated with it and thus not actually be empty when stored at rest.
Instead of 'deleting' the refresh token, 'update' it to be the empty
string.
This plays nicely with dbcrypt.
It also adds a 'utility test' in the dbcrypt package to help encrypt a
value. This was useful when manually fixing users affected by this bug
on our dogfood instance.
fixes#14881
Our handlers for streaming logs don't read from the websocket. We don't allow the client to send us any data, but the websocket library we use requires reading from the websocket to properly handle pings and closing. Not doing so can [can cause the websocket to hang on write](https://github.com/coder/websocket/issues/405), leaking go routines which were noticed in #14881.
This fixes the issue, and in process refactors our log streaming to a encoder/decoder package which provides generic types for sending JSON over websocket.
I'd also like for us to upgrade to the latest https://github.com/coder/websocket but we should also upgrade our tailscale fork before doing so to avoid including two copies of the websocket library.
Relates to https://github.com/coder/coder/issues/15082
Further to https://github.com/coder/coder/pull/15429, this reduces the
amount of false-positives returned by the 'is eligible for autostart'
part of the query. We achieve this by calculating the 'next start at'
time of the workspace, storing it in the database, and using it in our
`GetWorkspacesEligibleForTransition` query.
The prior implementation of the 'is eligible for autostart' query would
return _all_ workspaces that at some point in the future _might_ be
eligible for autostart. This now ensures we only return workspaces that
_should_ be eligible for autostart.
We also now pass `currentTick` instead of `t` to the
`GetWorkspacesEligibleForTransition` query as otherwise we'll have one
round of workspaces that are skipped by `isEligibleForTransition` due to
`currentTick` being a truncated version of `t`.
- Refactors `checkProvisioners` into `db2sdk.MatchedProvisioners`
- Adds a separate RBAC subject just for reading provisioner daemons
- Adds matched provisioners information to additional endpoints relating to
workspace builds and templates
-Updates existing unit tests for above endpoints
-Adds API endpoint for matched provisioners of template dry-run job
-Updates CLI to show warning when creating/starting/stopping/deleting
workspaces for which no provisoners are available
---------
Co-authored-by: Danny Kopping <danny@coder.com>
* Modifies `MatchedProvisioners` response of `codersdk.TemplateVersion`
to be a pointer
* CLI now checks for absence of `*MatchedProvisioners` before showing
warning regarding provisioners
* Extracts logic for warning about provisioners to a function
* Improves test coverage for CLI template push with
`coder_workspace_tags`.
Addresses https://github.com/coder/nexus/issues/99.
Changes:
- Save the id of the built-in example template used to create a template
version in the database
- Include the example id in telemetry
Aims to resolve#15605
There's currently one option valid for the `@Security` tag in
swaggerparser - which fails in the CI if we try to put any other value.
At least one of our endpoints does not accept `CoderSessionToken` as an
option for the authentication and so we need to add new possibilities in
order to keep the documentation up-to-date.
In this PR , I added `ProvisionerKey` which is the way our provisioner
daemon can authenticate to the backend - also modified a bit the code to
simplify other options later.
This one aims to resolve#15604
Created some table tests for the main cases -
also preferred to create two isolated cases for the most complicated
cases in order to keep table tests simple enough.
Give us full coverage on the middleware logic, for both optional and non
optional cases - PSK and ProvisionerKey.
Relates to https://github.com/coder/coder/issues/15087 and
https://github.com/coder/coder/issues/15427
- Extracts provisioner job tags from `coder_workspace_tags` on template
version creation using `provisioner/terraform/tfparse` added in
https://github.com/coder/coder/pull/15236
- Drops a WARN log in coderd if no matching provisioners found.
- Also drops a warning message in the CLI if no provisioners are found.
- To support both CLI and UI warnings, added a
`codersdk.MatchedProvisioners` struct to the `TemplateVersion` response
containing details of how many provisioners were around at the time of
the insert.
Co-authored-by: Mathias Fredriksson <mafredri@gmail.com>
Allows adding custom static CSP directives to Coder. Niche use case but
makes this easier then creating a reverse proxy that has to replace the
header. We want to preserve our directives, so having an append option
is preferred to a "replace" option via a reverse proxy.
Closes https://github.com/coder/coder/issues/15118
Once a token refresh fails, we remove the `oauth_refresh_token` from the
database. This will prevent the token from hitting the IDP for
subsequent refresh attempts.
Without this change, a bad script can cause a failing token to hit a
remote IDP repeatedly with each `git` operation. With this change, after
the first hit, subsequent hits will fail locally, and never contact the
IDP.
The solution in both cases is to authenticate the external auth link. So
the resolution is the same as before.
This PR is the first step aiming to resolve#15126 -
Creating a new endpoint to return the details associated to a
provisioner key.
This is an authenticated endpoints aiming to be used by the provisioner
daemons - using the provisioner key as authentication method.
This endpoint is not ment to be used with PSK or User Sessions.
Resolves https://github.com/coder/coder/issues/15513
Disables notifications when both `$CODER_NOTIFICATIONS_WEBHOOK_ENDPOINT` and `$CODER_EMAIL_SMARTHOST` are unset.
Breaking change: `$CODER_EMAIL_SMARTHOST` is no longer set by default as `localhost:587`, meaning any deployments that make use of this default value will need to add it back.
---------
Co-authored-by: Danny Kopping <danny@coder.com>
Co-authored-by: Mathias Fredriksson <mafredri@gmail.com>
Adds an api endpoint to grab all available sync field options for IDP
sync. This is for autocomplete on idp sync forms. This is required for
organization admins to have some insight into the claim fields available
when configuring group/role sync.
- We were instantiating a cryptokey cache with a vanilla reference to
the database instead of one wrapped by dbcrypt.
- Fixes an issue where failing to instantiate unrelated keycaches does
not fatally error out.
Refactors our use of `slogtest` to instantiate a "standard logger" across most of our tests. This standard logger incorporates https://github.com/coder/slog/pull/217 to also ignore database query canceled errors by default, which are a source of low-severity flakes.
Any test that has set non-default `slogtest.Options` is left alone. In particular, `coderdtest` defaults to ignoring all errors. We might consider revisiting that decision now that we have better tools to target the really common flaky Error logs on shutdown.
Addresses https://github.com/coder/nexus/issues/35.
This PR:
- Adds a `workspace_modules` table to track modules used by the
Terraform provisioner in provisioner jobs.
- Adds a `module_path` column to the `workspace_resources` table,
allowing to identify which module a resource originates from.
- Starts pushing this new information into telemetry.
For the person reviewing this PR, do not fret about the 1,500 new lines
- ~1,000 of them are auto-generated.
Bumps the Tailnet and Agent API version 2.3, and creates some extra controls and machinery around these versions.
What happened is that we accidentally shipped two new API features without bumping the version. `ScriptCompleted` on the Agent API in Coder v2.16 and `RefreshResumeToken` on the Tailnet API in Coder v2.15.
Since we can't easily retroactively bump the versions, we'll roll these changes into API version 2.3 along with the new WorkspaceUpdates RPC, which hasn't been released yet. That means there is some ambiguity in Coder v2.15-v2.17 about exactly what methods are supported on the Tailnet and Agent APIs. This isn't great, but hasn't caused us major issues because
1. RefreshResumeToken is considered optional, and clients just log and move on if the RPC isn't supported.
2. Agents basically never get started talking to a Coderd that is older than they are, since the agent binary is normally downloaded from Coderd at workspace start.
Still it's good to get things squared away in terms of versions for SDK users and possible edge cases around client and server versions.
To mitigate against this thing happening again, this PR also:
1. adds a CODEOWNERS for the API proto packages, so I'll review changes
2. defines interface types for different API versions, and has the agent explicitly use a specific version. That way, if you add a new method, and try to use it in the agent without thinking explicitly about versions, it won't compile.
With the protocol controllers stuff, we've sort of already abstracted the Tailnet API such that the interface type strategy won't work, but I'll work on getting the Controller to be version aware, such that it can check the API version it's getting against the controllers it has -- in a later PR.
Move claims from a `debug` column to an actual typed column to be used.
This does not functionally change anything, it just adds some Go typing to build
on.
Relates to https://github.com/coder/coder/issues/15082
The old implementation of `GetWorkspacesEligibleForTransition` returns
many workspaces that are not actually eligible for transition. This new
implementation reduces this number significantly (at least on our
dogfood instance).
- Assert rbac in fake notifications enqueuer
- Move fake notifications enqueuer to separate notificationstest package
- Update dbauthz rbac policy to allow provisionerd and autostart to create and read notification messages
- Update tests as required
Fixes test flakes _a la_
```
t.go:108: 2024-11-05 09:52:37.996 [erro] workspacestats: failed to load template schedule bumping activity, defaulting to bumping by 60min request_id=f14215d2-73dc-47ba-aa81-422c62f257e4 workspace_id=545d73c7-3a62-4466-8c08-b6abb12867b7 template_id=49747428-3abb-40e4-a6b2-03653e9f2506 ...
error= fetch object:
github.com/coder/coder/v2/coderd/database/dbauthz.(*querier).GetTemplateByID.fetch[...].func1
/home/runner/work/coder/coder/coderd/database/dbauthz/dbauthz.go:497
- pq: canceling statement due to user request
*** slogtest: log detected at level ERROR; TEST FAILURE ***
```
seen here on main: https://github.com/coder/coder/actions/runs/11681605747/job/32527006174
Closes https://github.com/coder/coder/issues/12612
The problem in the linked issue was caused due to a mismatch of when the
Web UI tooltip shows up (2 hours before an autostop requirement) and the
leeway in the `autostop_requirement` algorithm (workspace builds must be
1 hour before an autostop requirement to skip them).
Now, restarting your workspace whilst the tooltip is showing will skip
the upcoming autostop requirement.
This also could have been fixed by only showing the tooltip one hour
before the autostop requirement, but it looks like 1 hour was chosen
arbitrarily, and it doesn't hurt to give users more time to skip the
autostop.