Relates to https://github.com/coder/coder/issues/15432
Ensures that no workspace build timings with zero values for started_at or ended_at are inserted into the DB or returned from the API.
Closes https://github.com/coder/internal/issues/619
Implement the `coderd` side of the AgentAPI for the upcoming
dev-container agents work.
`agent/agenttest/client.go` is left unimplemented for a future PR
working to implement the agent side of this feature.
Adds telemetry for a _global_ account of prebuilt workspaces created,
failed to build, and claimed.
Partitioning this data by template/preset tuple is not currently in
scope.
---------
Signed-off-by: Danny Kopping <dannykopping@gmail.com>
The existing code persists all static parameters and their values. Using
the previous build as the source if no new inputs are found.
Dynamic params do not have a state of the parameters saved to disk. So
instead, all previous values are persisted always, and new inputs
override.
Adds a database trigger that runs on insert and update of the
`workspace_agents` table. The trigger ensures that the agent name is
unique within the context of the workspace build it is being inserted
into.
## Summary
This PR introduces support for expiration policies in prebuilds. The TTL
(time-to-live) is retrieved from the Terraform configuration
([terraform-provider-coder
PR](https://github.com/coder/terraform-provider-coder/pull/404)):
```
prebuilds = {
instances = 2
expiration_policy {
ttl = 86400
}
}
```
**Note**: Since there is no need for precise TTL enforcement down to the
second, in this implementation expired prebuilds are handled in a single
reconciliation cycle: they are deleted, and new instances are created
only if needed to match the desired count.
## Changes
* The outcome of a reconciliation cycle is now expressed as a slice of
reconciliation actions, instead of a single aggregated action.
* Adjusted reconciliation logic to delete expired prebuilds and
guarantee that the number of desired instances is correct.
* Updated relevant data structures and methods to support expiration
policies parameters.
* Added documentation to `Prebuilt workspaces` page
* Update `terraform-provider-coder` to version 2.5.0:
https://github.com/coder/terraform-provider-coder/releases/tag/v2.5.0
Depends on: https://github.com/coder/terraform-provider-coder/pull/404
Fixes: https://github.com/coder/coder/issues/17916
This change introduces a refactor of the devcontainers recreation logic
which is now handled asynchronously rather than being request scoped.
The response was consequently changed from "No Content" to "Accepted" to
reflect this.
A new `Status` field was introduced to the devcontainer struct which
replaces `Running` (bool). This reflects that the devcontainer can now
be in various states (starting, running, stopped or errored).
The status field also protects against multiple concurrent recrations,
as long as they are initiated via the API.
Updates #16424
This change introduces a significant refactor to the agentcontainers API
and enables periodic updates of Docker containers rather than on-demand.
Consequently this change also allows us to move away from using a
locking channel and replace it with a mutex, which simplifies usage.
Additionally a previous oversight was fixed, and testing added, to clear
devcontainer running/dirty status when the container has been removed.
Updates coder/coder#16424
Updates coder/internal#621
This PR ensures that waits on channels will time out according to the
test context, rather than waiting indefinitely. This should alleviate
the panic seen in https://github.com/coder/internal/issues/645 and, if
the deadlock recurs, allow the test to be retried automatically in CI.
Relates to https://github.com/coder/coder/issues/17432
### Part 1:
Notes:
- `GetPresetsAtFailureLimit` SQL query is added, which is similar to
`GetPresetsBackoff`, they use same CTEs: `filtered_builds`,
`time_sorted_builds`, but they are still different.
- Query is executed on every loop iteration. We can consider marking
specific preset as permanently failed as an optimization to avoid
executing query on every loop iteration. But I decided don't do it for
now.
- By default `FailureHardLimit` is set to 3.
- `FailureHardLimit` is configurable. Setting it to zero - means that
hard limit is disabled.
### Part 2
Notes:
- `PrebuildFailureLimitReached` notification is added.
- Notification is sent to template admins.
- Notification is sent only the first time, when hard limit is reached.
But it will `log.Warn` on every loop iteration.
- I introduced this enum:
```sql
CREATE TYPE prebuild_status AS ENUM (
'normal', -- Prebuilds are working as expected; this is the default, healthy state.
'hard_limited', -- Prebuilds have failed repeatedly and hit the configured hard failure limit; won't be retried anymore.
'validation_failed' -- Prebuilds failed due to a non-retryable validation error (e.g. template misconfiguration); won't be retried.
);
```
`validation_failed` not used in this PR, but I think it will be used in
next one, so I wanted to save us an extra migration.
- Notification looks like this:
<img width="472" alt="image"
src="https://github.com/user-attachments/assets/e10efea0-1790-4e7f-a65c-f94c40fced27"
/>
### Latest notification views:
<img width="463" alt="image"
src="https://github.com/user-attachments/assets/11310c58-68d1-4075-a497-f76d854633fe"
/>
<img width="725" alt="image"
src="https://github.com/user-attachments/assets/6bbfe21a-91ac-47c3-a9d1-21807bb0c53a"
/>
This PR refactors the CompleteJob function to use database transactions
more consistently for better atomicity guarantees. The large function
was broken down into three specialized handlers:
- completeTemplateImportJob
- completeWorkspaceBuildJob
- completeTemplateDryRunJob
Each handler now uses the Database.InTx wrapper to ensure all database
operations for a job completion are performed within a single
transaction, preventing partial updates in case of failures.
Added comprehensive tests for transaction behavior for each job type.
Fixes#17694🤖 Generated with [Claude Code](https://claude.ai/code)
Co-authored-by: Claude <noreply@anthropic.com>
Dynamic params skip parameter validation in coder/coder.
This is because conditional parameters cannot be validated
with the static parameters in the database.
# Use workspace.OwnerUsername instead of fetching the owner
This PR optimizes the agent API by using the `workspace.OwnerUsername` field directly instead of making an additional database query to fetch the owner's username. The change removes the need to call `GetUserByID` in the manifest API and workspace agent RPC endpoints.
An issue arose when the agent token was scoped without access to user data (`api_key_scope = "no_user_data"`), causing the agent to fail to fetch the manifest due to an RBAC issue.
Change-Id: I3b6e7581134e2374b364ee059e3b18ece3d98b41
Signed-off-by: Thomas Kosiewski <tk@coder.com>
Pass through the user input as is. The previous code only passed through
parameters that existed in the db (static params). This would omit
conditional params.
Validation is enforced by the dynamic params websocket, so validation at
this point is not required.
Closes https://github.com/coder/internal/issues/648
This change introduces a new `ParentId` field to the agent's manifest.
This will allow an agent to know if it is a child or not, as well as
knowing who the owner is.
This is part of the Dev Container Agents work
# Description
This PR adds the `worker_name` field to the provisioner jobs endpoint.
To achieve this, the following SQL query was updated:
-
`GetProvisionerJobsByOrganizationAndStatusWithQueuePositionAndProvisioner`
As a result, the `codersdk.ProvisionerJob` type, which represents the
provisioner job API response, was modified to include the new field.
**Notes:**
* As mentioned in
[comment](https://github.com/coder/coder/pull/17877#discussion_r2093218206),
the `GetProvisionerJobsByIDsWithQueuePosition` query was not changed due
to load concerns. This means that for template and template version
endpoints, `worker_id` will still be returned, but `worker_name` will
not.
* Similar to `worker_id`, the `worker_name` is only present once a job
is assigned to a provisioner daemon. For jobs in a pending state (not
yet assigned), neither `worker_id` nor `worker_name` will be returned.
---
# Affected Endpoints
- `/organizations/{organization}/provisionerjobs`
- `/organizations/{organization}/provisionerjobs/{job}`
---
# Testing
- Added new tests verifying that both `worker_id` and `worker_name` are
returned once a provisioner job reaches the **succeeded** state.
- Existing tests covering state transitions and other logic remain
unchanged, as they test different scenarios.
---
# Front-end Changes
Admin provisioner jobs dashboard:
<img width="1088" alt="Screenshot 2025-05-16 at 11 51 33"
src="https://github.com/user-attachments/assets/0e20e360-c615-4497-84b7-693777c5443e"
/>
Fixes: https://github.com/coder/coder/issues/16982
closes https://github.com/coder/internal/issues/632
`pubsubReinitSpy` used to signal that a subscription had happened before
it actually had.
This created a slight opportunity for the main goroutine to publish
before the actual subscription was listening. The published event was
then dropped, leading to a failed test.
fixes#17070
Cleans up our handling of APIKey expiration and OIDC to keep them separate concepts. For an OIDC-login APIKey, both the APIKey and OIDC link must be valid to login. If the OIDC link is expired and we have a refresh token, we will attempt to refresh.
OIDC refreshes do not have any effect on APIKey expiry.
https://github.com/coder/coder/issues/17070#issuecomment-2886183613 explains why this is the correct behavior.
Existing template versions do not have the metadata (modules + plan) in
the db. So revert to using static parameter information from the
original template import.
This data will still be served over the websocket.
Also add some clarification about the lack of database constraints for
soft template deletion.
---------
Signed-off-by: Danny Kopping <dannykopping@gmail.com>
Co-authored-by: Danny Kopping <dannykopping@gmail.com>
Closes https://github.com/coder/internal/issues/369
We can't know whether a replacement (i.e. drift of terraform state
leading to a resource needing to be deleted/recreated) will take place
apriori; we can only detect it at `plan` time, because the provider
decides whether a resource must be replaced and it cannot be inferred
through static analysis of the template.
**This is likely to be the most common gotcha with using prebuilds,
since it requires a slight template modification to use prebuilds
effectively**, so let's head this off before it's an issue for
customers.
Drift details will now be logged in the workspace build logs:

Plus a notification will be sent to template admins when this situation
arises:

A new metric - `coderd_prebuilt_workspaces_resource_replacements_total`
- will also increment each time a workspace encounters replacements.
We only track _that_ a resource replacement occurred, not how many. Just
one is enough to ruin a prebuild, but we can't know apriori which
replacement would cause this.
For example, say we have 2 replacements: a `docker_container` and a
`null_resource`; we don't know which one might
cause an issue (or indeed if either would), so we just track the
replacement.
---------
Signed-off-by: Danny Kopping <dannykopping@gmail.com>
This pull request allows coder workspace agents to be reinitialized when
a prebuilt workspace is claimed by a user. This facilitates the transfer
of ownership between the anonymous prebuilds system user and the new
owner of the workspace.
Only a single agent per prebuilt workspace is supported for now, but
plumbing has already been done to facilitate the seamless transition to
multi-agent support.
---------
Signed-off-by: Danny Kopping <dannykopping@gmail.com>
Co-authored-by: Danny Kopping <dannykopping@gmail.com>
Avoids two sequential scans of massive tables (`workspace_builds`,
`provisioner_jobs`) and uses index scans instead. This new view largely
replicates our already optimized query `GetWorkspaces` to fetch the
latest build.
The original query and the new query were compared against the dogfood
database to ensure they return the exact same data in the exact same
order (minus the new `workspaces.deleted = false` filter to improve
performance even more). The performance is massively improved even
without the `workspaces.deleted = false` filter, but it was added to
improve it even more.
Note: these query times are probably inflated due to high database load
on our dogfood environment that this intends to partially resolve.
Before: 2,139ms
([explain](https://explain.dalibo.com/plan/997e4fch241b46e6))
After: 33ms
([explain](https://explain.dalibo.com/plan/c888dc223870f181))
Co-authored-by: Cian Johnston <cian@coder.com>
---------
Signed-off-by: Danny Kopping <dannykopping@gmail.com>
Co-authored-by: Mathias Fredriksson <mafredri@gmail.com>
Co-authored-by: Danny Kopping <dannykopping@gmail.com>
## Description
Modifies the behaviour of the "list templates" API endpoints to return
non-deprecated templates by default. Users can still query for
deprecated templates by specifying the `deprecated=true` query
parameter.
**Note:** The deprecation feature is an enterprise-level feature
## Affected Endpoints
* /api/v2/organizations/{organization}/templates
* /api/v2/templates
Fixes#17565