This PR extracts dynamic parameter rendering logic from
coderd/parameters.go into a new coderd/dynamicparameters package. Partly
for organization and maintainability, but primarily to be reused in
`wsbuilder` to be leveraged as validation.
## Description
This PR adds support for deleting prebuilt workspaces via the
authorization layer. It introduces special-case handling to ensure that
`prebuilt_workspace` permissions are evaluated when attempting to delete
a prebuilt workspace, falling back to the standard `workspace` resource
as needed.
Prebuilt workspaces are a subset of workspaces, identified by having
`owner_id` set to `PREBUILD_SYSTEM_USER`.
This means:
* A user with `prebuilt_workspace.delete` permission is allowed to
**delete only prebuilt workspaces**.
* A user with `workspace.delete` permission can **delete both normal and
prebuilt workspaces**.
⚠️ This implementation is scoped to **deletion operations only**. No
other operations are currently supported for the `prebuilt_workspace`
resource.
To delete a workspace, users must have the following permissions:
* `workspace.read`: to read the current workspace state
* `update`: to modify workspace metadata and related resources during
deletion (e.g., updating the `deleted` field in the database)
* `delete`: to perform the actual deletion of the workspace
## Changes
* Introduced `authorizeWorkspace()` helper to handle prebuilt workspace
authorization logic.
* Ensured both `prebuilt_workspace` and `workspace` permissions are
checked.
* Added comments to clarify the current behavior and limitations.
* Moved `SystemUserID` constant from the `prebuilds` package to the
`database` package `PrebuildsSystemUserID` to resolve an import cycle
(commit
f24e4ab4b6).
* Update middleware `ExtractOrganizationMember` to include system user
members.
Closes https://github.com/coder/internal/issues/312
Depends on https://github.com/coder/terraform-provider-coder/pull/408
This PR adds support for defining an **autoscaling block** for
prebuilds, allowing number of desired instances to scale dynamically
based on a schedule.
Example usage:
```
data "coder_workspace_preset" "us-nix" {
...
prebuilds = {
instances = 0 # default to 0 instances
scheduling = {
timezone = "UTC" # a single timezone is used for simplicity
# Scale to 3 instances during the work week
schedule {
cron = "* 8-18 * * 1-5" # from 8AM–6:59PM, Mon–Fri, UTC
instances = 3 # scale to 3 instances
}
# Scale to 1 instance on Saturdays for urgent support queries
schedule {
cron = "* 8-14 * * 6" # from 8AM–2:59PM, Sat, UTC
instances = 1 # scale to 1 instance
}
}
}
}
```
### Behavior
- Multiple `schedule` blocks per `prebuilds` block are supported.
- If the current time matches any defined autoscaling schedule, the
corresponding number of instances is used.
- If no schedule matches, the **default instance count**
(`prebuilds.instances`) is used as a fallback.
### Why
This feature allows prebuild instance capacity to adapt to predictable
usage patterns, such as:
- Scaling up during business hours or high-demand periods
- Reducing capacity during off-hours to save resources
### Cron specification
The cron specification is interpreted as a **continuous time range.**
For example, the expression:
```
* 9-18 * * 1-5
```
is intended to represent a continuous range from **09:00 to 18:59**,
Monday through Friday.
However, due to minor implementation imprecision, it is currently
interpreted as a range from **08:59:00 to 18:58:59**, Monday through
Friday.
This slight discrepancy arises because the evaluation is based on
whether a specific **point in time** falls within the range, using the
`github.com/coder/coder/v2/coderd/schedule/cron` library, which performs
per-minute matching rather than strict range evaluation.
---------
Co-authored-by: Danny Kopping <danny@coder.com>
Adds a custom marshaler to handle some cases where nils were being
marshaled to nulls, causing the web UI to throw an error.
---------
Co-authored-by: Steven Masley <stevenmasley@gmail.com>
I modified the proxy host cache we already had and were using for
websocket csp headers to also include the wildcard app host, then used
those for frame-src policies.
I did not add frame-ancestors, since if I understand correctly, those
would go on the app, and this middleware does not come into play there.
Maybe we will want to add it on workspace apps like we do with cors, if
we find apps are setting it to `none` or something.
Closes https://github.com/coder/internal/issues/684
## Description
Adds tests for `ReconcileAll` to verify the full reconciliation flow
when handling expired prebuilds. This complements existing lower-level
tests by checking multiple reconciliation actions (delete + create) at
the higher reconciliation cycle level.
Related with comment:
https://github.com/coder/coder/pull/17996#issuecomment-2910516489
The file cache was caching the `Unauthorized` errors if a user without
the right perms opened the file first. So all future opens would fail.
Now the cache always opens with a subject that can read files. And authz
is checked on the Acquire per user.
```
// Report a metric only if the preset uses the latest version of the template and the template is not deleted.
// This avoids conflicts between metrics from old and new template versions.
//
// NOTE: Multiple versions of a preset can exist with the same orgName, templateName, and presetName,
// because templates can have multiple versions — or deleted templates can share the same name.
//
// The safest approach is to report the metric only for the latest version of the preset.
// When a new template version is released, the metric for the new preset should overwrite
// the old value in Prometheus.
//
// However, there’s one edge case: if an admin creates a template, it becomes hard-limited,
// then deletes the template and never creates another with the same name,
// the old preset will continue to be reported as hard-limited —
// even though it’s deleted. This will persist until `coderd` is restarted.
```
## Summary
This PR introduces support for expiration policies in prebuilds. The TTL
(time-to-live) is retrieved from the Terraform configuration
([terraform-provider-coder
PR](https://github.com/coder/terraform-provider-coder/pull/404)):
```
prebuilds = {
instances = 2
expiration_policy {
ttl = 86400
}
}
```
**Note**: Since there is no need for precise TTL enforcement down to the
second, in this implementation expired prebuilds are handled in a single
reconciliation cycle: they are deleted, and new instances are created
only if needed to match the desired count.
## Changes
* The outcome of a reconciliation cycle is now expressed as a slice of
reconciliation actions, instead of a single aggregated action.
* Adjusted reconciliation logic to delete expired prebuilds and
guarantee that the number of desired instances is correct.
* Updated relevant data structures and methods to support expiration
policies parameters.
* Added documentation to `Prebuilt workspaces` page
* Update `terraform-provider-coder` to version 2.5.0:
https://github.com/coder/terraform-provider-coder/releases/tag/v2.5.0
Depends on: https://github.com/coder/terraform-provider-coder/pull/404
Fixes: https://github.com/coder/coder/issues/17916
Closes https://github.com/coder/coder/issues/17988
Define `preset_hard_limited` metric which for every preset indicates
whether a given preset has reached the hard failure limit (1 for
hard-limited, 0 otherwise).
CLI example:
```
curl -X GET localhost:2118/metrics | grep preset_hard_limited
# HELP coderd_prebuilt_workspaces_preset_hard_limited Indicates whether a given preset has reached the hard failure limit (1 for hard-limited, 0 otherwise).
# TYPE coderd_prebuilt_workspaces_preset_hard_limited gauge
coderd_prebuilt_workspaces_preset_hard_limited{organization_name="coder",preset_name="GoLand: Large",template_name="Test7"} 1
coderd_prebuilt_workspaces_preset_hard_limited{organization_name="coder",preset_name="GoLand: Large",template_name="ValidTemplate"} 0
coderd_prebuilt_workspaces_preset_hard_limited{organization_name="coder",preset_name="IU: Medium",template_name="Test7"} 1
coderd_prebuilt_workspaces_preset_hard_limited{organization_name="coder",preset_name="IU: Medium",template_name="ValidTemplate"} 0
coderd_prebuilt_workspaces_preset_hard_limited{organization_name="coder",preset_name="WS: Small",template_name="Test7"} 1
```
NOTE:
```go
if !ps.Preset.Deleted && ps.Preset.UsingActiveVersion {
c.metrics.trackHardLimitedStatus(ps.Preset.OrganizationName, ps.Preset.TemplateName, ps.Preset.Name, ps.IsHardLimited)
}
```
Only active template version is tracked. If admin creates new template
version - old value of metric (for previous template version) will be
overwritten with new value of metric (for active template version).
Because `template_version` is not part of metric:
```go
labels = []string{"template_name", "preset_name", "organization_name"}
```
Implementation is similar to implementation of
`MetricResourceReplacementsCount` metric
---------
Co-authored-by: Susana Ferreira <ssncferreira@gmail.com>
Relates to https://github.com/coder/coder/issues/17432
### Part 1:
Notes:
- `GetPresetsAtFailureLimit` SQL query is added, which is similar to
`GetPresetsBackoff`, they use same CTEs: `filtered_builds`,
`time_sorted_builds`, but they are still different.
- Query is executed on every loop iteration. We can consider marking
specific preset as permanently failed as an optimization to avoid
executing query on every loop iteration. But I decided don't do it for
now.
- By default `FailureHardLimit` is set to 3.
- `FailureHardLimit` is configurable. Setting it to zero - means that
hard limit is disabled.
### Part 2
Notes:
- `PrebuildFailureLimitReached` notification is added.
- Notification is sent to template admins.
- Notification is sent only the first time, when hard limit is reached.
But it will `log.Warn` on every loop iteration.
- I introduced this enum:
```sql
CREATE TYPE prebuild_status AS ENUM (
'normal', -- Prebuilds are working as expected; this is the default, healthy state.
'hard_limited', -- Prebuilds have failed repeatedly and hit the configured hard failure limit; won't be retried anymore.
'validation_failed' -- Prebuilds failed due to a non-retryable validation error (e.g. template misconfiguration); won't be retried.
);
```
`validation_failed` not used in this PR, but I think it will be used in
next one, so I wanted to save us an extra migration.
- Notification looks like this:
<img width="472" alt="image"
src="https://github.com/user-attachments/assets/e10efea0-1790-4e7f-a65c-f94c40fced27"
/>
### Latest notification views:
<img width="463" alt="image"
src="https://github.com/user-attachments/assets/11310c58-68d1-4075-a497-f76d854633fe"
/>
<img width="725" alt="image"
src="https://github.com/user-attachments/assets/6bbfe21a-91ac-47c3-a9d1-21807bb0c53a"
/>
Dynamic params skip parameter validation in coder/coder.
This is because conditional parameters cannot be validated
with the static parameters in the database.
Existing template versions do not have the metadata (modules + plan) in
the db. So revert to using static parameter information from the
original template import.
This data will still be served over the websocket.
Also add some clarification about the lack of database constraints for
soft template deletion.
---------
Signed-off-by: Danny Kopping <dannykopping@gmail.com>
Co-authored-by: Danny Kopping <dannykopping@gmail.com>
Closes https://github.com/coder/internal/issues/369
We can't know whether a replacement (i.e. drift of terraform state
leading to a resource needing to be deleted/recreated) will take place
apriori; we can only detect it at `plan` time, because the provider
decides whether a resource must be replaced and it cannot be inferred
through static analysis of the template.
**This is likely to be the most common gotcha with using prebuilds,
since it requires a slight template modification to use prebuilds
effectively**, so let's head this off before it's an issue for
customers.
Drift details will now be logged in the workspace build logs:

Plus a notification will be sent to template admins when this situation
arises:

A new metric - `coderd_prebuilt_workspaces_resource_replacements_total`
- will also increment each time a workspace encounters replacements.
We only track _that_ a resource replacement occurred, not how many. Just
one is enough to ruin a prebuild, but we can't know apriori which
replacement would cause this.
For example, say we have 2 replacements: a `docker_container` and a
`null_resource`; we don't know which one might
cause an issue (or indeed if either would), so we just track the
replacement.
---------
Signed-off-by: Danny Kopping <dannykopping@gmail.com>
This pull request allows coder workspace agents to be reinitialized when
a prebuilt workspace is claimed by a user. This facilitates the transfer
of ownership between the anonymous prebuilds system user and the new
owner of the workspace.
Only a single agent per prebuilt workspace is supported for now, but
plumbing has already been done to facilitate the seamless transition to
multi-agent support.
---------
Signed-off-by: Danny Kopping <dannykopping@gmail.com>
Co-authored-by: Danny Kopping <dannykopping@gmail.com>
`Collect()` is called whenever the `/metrics` endpoint is hit to
retrieve metrics.
The queries used in prebuilds metrics collection are quite heavy, and we
want to avoid having them running concurrently / too often to keep db
load down.
Here I'm moving towards a background retrieval of the state required to
set the metrics, which gets invalidated every interval.
Also introduces `coderd_prebuilt_workspaces_metrics_last_updated` which
operators can use to determine when these metrics go stale.
See https://github.com/coder/coder/pull/17789 as well.
---------
Signed-off-by: Danny Kopping <dannykopping@gmail.com>
Closes https://github.com/coder/coder/issues/17691
`ExtractOrganizationMembersParam` will allow fetching a user with only
organization permissions. If the user belongs to 0 orgs, then the user "does not exist"
from an org perspective. But if you are a site-wide admin, then the user does exist.
Database transactions hold onto connections, and `pubsub.Publish` tries
to acquire a connection of its own. If the latter is called within a
transaction, this can lead to connection exhaustion.
I plan two follow-ups to this PR:
1. Make connection counts tuneable
https://github.com/coder/coder/blob/main/cli/server.go#L2360-L2376
We will then be able to write tests showing how connection exhaustion
occurs.
2. Write a linter/ruleguard to prevent `pubsub.Publish` from being
called within a transaction.
---------
Signed-off-by: Danny Kopping <dannykopping@gmail.com>
Don't specify the template version for a delete transition, because the
prebuilt workspace may have been created using an older template
version.
If the template version isn't explicitly set, the builder will
automatically use the version from the last workspace build - which is
the desired behavior.
PR contains:
- fix for claiming & deleting prebuilds with immutable params
- unit test for claiming scenario
- unit test for deletion scenario
The parameter resolver was failing when deleting/claiming prebuilds
because a value for a previously-used parameter was provided to the
resolver, but since the value was unchanged (it's coming from the
preset) it failed in the resolver. The resolver was missing a check to
see if the old value != new value; if the values match then there's no
mutation of an immutable parameter.
---------
Signed-off-by: Danny Kopping <dannykopping@gmail.com>
Closes https://github.com/coder/internal/issues/510
<details>
<summary> Refactoring Summary </summary>
### 1) `CalculateActions` Function
#### Issues Before Refactoring:
- Large function (~150 lines), making it difficult to read and maintain.
- The control flow is hard to follow due to complex conditional logic.
- The `ReconciliationActions` struct was partially initialized early,
then mutated in multiple places, making the flow error-prone.
Original source:
fe60b569ad/coderd/prebuilds/state.go (L13-L167)
#### Improvements After Refactoring:
- Simplified and broken down into smaller, focused helper methods.
- The flow of the function is now more linear and easier to understand.
- Struct initialization is cleaner, avoiding partial and incremental
mutations.
Refactored function:
eeb0407d78/coderd/prebuilds/state.go (L67-L84)
---
### 2) `ReconciliationActions` Struct
#### Issues Before Refactoring:
- The struct mixed both actionable decisions and diagnostic state, which
blurred its purpose.
- It was unclear which fields were necessary for reconciliation logic,
and which were purely for logging/observability.
#### Improvements After Refactoring:
- Split into two clear, purpose-specific structs:
- **`ReconciliationActions`** — defines the intended reconciliation
action.
- **`ReconciliationState`** — captures runtime state and metadata,
primarily for logging and diagnostics.
Original struct:
fe60b569ad/coderd/prebuilds/reconcile.go (L29-L41)
</details>
---------
Signed-off-by: Danny Kopping <dannykopping@gmail.com>
Co-authored-by: Sas Swart <sas.swart.cdk@gmail.com>
Co-authored-by: Danny Kopping <dannykopping@gmail.com>
Co-authored-by: Dean Sheather <dean@deansheather.com>
Co-authored-by: Spike Curtis <spike@coder.com>
Co-authored-by: Danny Kopping <danny@coder.com>
Deleted organizations are still attempting to sync members. This causes
an error on inserting the member, and would likely cause issues later in
the sync process even if that member is inserted. Deleted orgs should be
skipped.
- Removes displaying XRay scan results in the dashboard. I'm not sure
anyone was even using this integration so it's just debt for us to
maintain. We can open up a separate issue to get rid of the db tables
once we know for sure that we haven't broken anyone.