## Description
This PR adds support for deleting prebuilt workspaces via the
authorization layer. It introduces special-case handling to ensure that
`prebuilt_workspace` permissions are evaluated when attempting to delete
a prebuilt workspace, falling back to the standard `workspace` resource
as needed.
Prebuilt workspaces are a subset of workspaces, identified by having
`owner_id` set to `PREBUILD_SYSTEM_USER`.
This means:
* A user with `prebuilt_workspace.delete` permission is allowed to
**delete only prebuilt workspaces**.
* A user with `workspace.delete` permission can **delete both normal and
prebuilt workspaces**.
⚠️ This implementation is scoped to **deletion operations only**. No
other operations are currently supported for the `prebuilt_workspace`
resource.
To delete a workspace, users must have the following permissions:
* `workspace.read`: to read the current workspace state
* `update`: to modify workspace metadata and related resources during
deletion (e.g., updating the `deleted` field in the database)
* `delete`: to perform the actual deletion of the workspace
## Changes
* Introduced `authorizeWorkspace()` helper to handle prebuilt workspace
authorization logic.
* Ensured both `prebuilt_workspace` and `workspace` permissions are
checked.
* Added comments to clarify the current behavior and limitations.
* Moved `SystemUserID` constant from the `prebuilds` package to the
`database` package `PrebuildsSystemUserID` to resolve an import cycle
(commit
f24e4ab4b6).
* Update middleware `ExtractOrganizationMember` to include system user
members.
Closes https://github.com/coder/internal/issues/312
Depends on https://github.com/coder/terraform-provider-coder/pull/408
This PR adds support for defining an **autoscaling block** for
prebuilds, allowing number of desired instances to scale dynamically
based on a schedule.
Example usage:
```
data "coder_workspace_preset" "us-nix" {
...
prebuilds = {
instances = 0 # default to 0 instances
scheduling = {
timezone = "UTC" # a single timezone is used for simplicity
# Scale to 3 instances during the work week
schedule {
cron = "* 8-18 * * 1-5" # from 8AM–6:59PM, Mon–Fri, UTC
instances = 3 # scale to 3 instances
}
# Scale to 1 instance on Saturdays for urgent support queries
schedule {
cron = "* 8-14 * * 6" # from 8AM–2:59PM, Sat, UTC
instances = 1 # scale to 1 instance
}
}
}
}
```
### Behavior
- Multiple `schedule` blocks per `prebuilds` block are supported.
- If the current time matches any defined autoscaling schedule, the
corresponding number of instances is used.
- If no schedule matches, the **default instance count**
(`prebuilds.instances`) is used as a fallback.
### Why
This feature allows prebuild instance capacity to adapt to predictable
usage patterns, such as:
- Scaling up during business hours or high-demand periods
- Reducing capacity during off-hours to save resources
### Cron specification
The cron specification is interpreted as a **continuous time range.**
For example, the expression:
```
* 9-18 * * 1-5
```
is intended to represent a continuous range from **09:00 to 18:59**,
Monday through Friday.
However, due to minor implementation imprecision, it is currently
interpreted as a range from **08:59:00 to 18:58:59**, Monday through
Friday.
This slight discrepancy arises because the evaluation is based on
whether a specific **point in time** falls within the range, using the
`github.com/coder/coder/v2/coderd/schedule/cron` library, which performs
per-minute matching rather than strict range evaluation.
---------
Co-authored-by: Danny Kopping <danny@coder.com>
## Summary
This PR introduces support for expiration policies in prebuilds. The TTL
(time-to-live) is retrieved from the Terraform configuration
([terraform-provider-coder
PR](https://github.com/coder/terraform-provider-coder/pull/404)):
```
prebuilds = {
instances = 2
expiration_policy {
ttl = 86400
}
}
```
**Note**: Since there is no need for precise TTL enforcement down to the
second, in this implementation expired prebuilds are handled in a single
reconciliation cycle: they are deleted, and new instances are created
only if needed to match the desired count.
## Changes
* The outcome of a reconciliation cycle is now expressed as a slice of
reconciliation actions, instead of a single aggregated action.
* Adjusted reconciliation logic to delete expired prebuilds and
guarantee that the number of desired instances is correct.
* Updated relevant data structures and methods to support expiration
policies parameters.
* Added documentation to `Prebuilt workspaces` page
* Update `terraform-provider-coder` to version 2.5.0:
https://github.com/coder/terraform-provider-coder/releases/tag/v2.5.0
Depends on: https://github.com/coder/terraform-provider-coder/pull/404
Fixes: https://github.com/coder/coder/issues/17916
Relates to https://github.com/coder/coder/issues/17432
### Part 1:
Notes:
- `GetPresetsAtFailureLimit` SQL query is added, which is similar to
`GetPresetsBackoff`, they use same CTEs: `filtered_builds`,
`time_sorted_builds`, but they are still different.
- Query is executed on every loop iteration. We can consider marking
specific preset as permanently failed as an optimization to avoid
executing query on every loop iteration. But I decided don't do it for
now.
- By default `FailureHardLimit` is set to 3.
- `FailureHardLimit` is configurable. Setting it to zero - means that
hard limit is disabled.
### Part 2
Notes:
- `PrebuildFailureLimitReached` notification is added.
- Notification is sent to template admins.
- Notification is sent only the first time, when hard limit is reached.
But it will `log.Warn` on every loop iteration.
- I introduced this enum:
```sql
CREATE TYPE prebuild_status AS ENUM (
'normal', -- Prebuilds are working as expected; this is the default, healthy state.
'hard_limited', -- Prebuilds have failed repeatedly and hit the configured hard failure limit; won't be retried anymore.
'validation_failed' -- Prebuilds failed due to a non-retryable validation error (e.g. template misconfiguration); won't be retried.
);
```
`validation_failed` not used in this PR, but I think it will be used in
next one, so I wanted to save us an extra migration.
- Notification looks like this:
<img width="472" alt="image"
src="https://github.com/user-attachments/assets/e10efea0-1790-4e7f-a65c-f94c40fced27"
/>
### Latest notification views:
<img width="463" alt="image"
src="https://github.com/user-attachments/assets/11310c58-68d1-4075-a497-f76d854633fe"
/>
<img width="725" alt="image"
src="https://github.com/user-attachments/assets/6bbfe21a-91ac-47c3-a9d1-21807bb0c53a"
/>
Closes https://github.com/coder/internal/issues/369
We can't know whether a replacement (i.e. drift of terraform state
leading to a resource needing to be deleted/recreated) will take place
apriori; we can only detect it at `plan` time, because the provider
decides whether a resource must be replaced and it cannot be inferred
through static analysis of the template.
**This is likely to be the most common gotcha with using prebuilds,
since it requires a slight template modification to use prebuilds
effectively**, so let's head this off before it's an issue for
customers.
Drift details will now be logged in the workspace build logs:

Plus a notification will be sent to template admins when this situation
arises:

A new metric - `coderd_prebuilt_workspaces_resource_replacements_total`
- will also increment each time a workspace encounters replacements.
We only track _that_ a resource replacement occurred, not how many. Just
one is enough to ruin a prebuild, but we can't know apriori which
replacement would cause this.
For example, say we have 2 replacements: a `docker_container` and a
`null_resource`; we don't know which one might
cause an issue (or indeed if either would), so we just track the
replacement.
---------
Signed-off-by: Danny Kopping <dannykopping@gmail.com>
This pull request allows coder workspace agents to be reinitialized when
a prebuilt workspace is claimed by a user. This facilitates the transfer
of ownership between the anonymous prebuilds system user and the new
owner of the workspace.
Only a single agent per prebuilt workspace is supported for now, but
plumbing has already been done to facilitate the seamless transition to
multi-agent support.
---------
Signed-off-by: Danny Kopping <dannykopping@gmail.com>
Co-authored-by: Danny Kopping <dannykopping@gmail.com>
Closes https://github.com/coder/internal/issues/510
<details>
<summary> Refactoring Summary </summary>
### 1) `CalculateActions` Function
#### Issues Before Refactoring:
- Large function (~150 lines), making it difficult to read and maintain.
- The control flow is hard to follow due to complex conditional logic.
- The `ReconciliationActions` struct was partially initialized early,
then mutated in multiple places, making the flow error-prone.
Original source:
fe60b569ad/coderd/prebuilds/state.go (L13-L167)
#### Improvements After Refactoring:
- Simplified and broken down into smaller, focused helper methods.
- The flow of the function is now more linear and easier to understand.
- Struct initialization is cleaner, avoiding partial and incremental
mutations.
Refactored function:
eeb0407d78/coderd/prebuilds/state.go (L67-L84)
---
### 2) `ReconciliationActions` Struct
#### Issues Before Refactoring:
- The struct mixed both actionable decisions and diagnostic state, which
blurred its purpose.
- It was unclear which fields were necessary for reconciliation logic,
and which were purely for logging/observability.
#### Improvements After Refactoring:
- Split into two clear, purpose-specific structs:
- **`ReconciliationActions`** — defines the intended reconciliation
action.
- **`ReconciliationState`** — captures runtime state and metadata,
primarily for logging and diagnostics.
Original struct:
fe60b569ad/coderd/prebuilds/reconcile.go (L29-L41)
</details>
---------
Signed-off-by: Danny Kopping <dannykopping@gmail.com>
Co-authored-by: Sas Swart <sas.swart.cdk@gmail.com>
Co-authored-by: Danny Kopping <dannykopping@gmail.com>
Co-authored-by: Dean Sheather <dean@deansheather.com>
Co-authored-by: Spike Curtis <spike@coder.com>
Co-authored-by: Danny Kopping <danny@coder.com>
Pre-requisite for https://github.com/coder/coder/pull/16891
Closes https://github.com/coder/internal/issues/515
This PR introduces a new concept of a "system" user.
Our data model requires that all workspaces have an owner (a `users`
relation), and prebuilds is a feature that will spin up workspaces to be
claimed later by actual users - and thus needs to own the workspaces in
the interim.
Naturally, introducing a change like this touches a few aspects around
the codebase and we've taken the approach _default hidden_ here; in
other words, queries for users will by default _exclude_ all system
users, but there is a flag to ensure they can be displayed. This keeps
the changeset relatively small.
This user has minimal permissions (it's equivalent to a `member` since
it has no roles). It will be associated with the default org in the
initial migration, and thereafter we'll need to somehow ensure its
membership aligns with templates (which are org-scoped) for which it'll
need to provision prebuilds; that's a solution we'll have in a
subsequent PR.
---------
Signed-off-by: Danny Kopping <dannykopping@gmail.com>
Co-authored-by: Sas Swart <sas.swart.cdk@gmail.com>