coder

mirror of https://github.com/coder/coder.git synced 2025-07-12 00:14:10 +00:00

Author	SHA1	Message	Date
ケイラ	fae30a00fd	chore: remove unnecessary redeclarations in for loops (#18440 )	2025-06-20 13:16:55 -06:00
Susana Ferreira	72f7d70bab	feat: allow TemplateAdmin to delete prebuilds via auth layer (#18333 ) ## Description This PR adds support for deleting prebuilt workspaces via the authorization layer. It introduces special-case handling to ensure that `prebuilt_workspace` permissions are evaluated when attempting to delete a prebuilt workspace, falling back to the standard `workspace` resource as needed. Prebuilt workspaces are a subset of workspaces, identified by having `owner_id` set to `PREBUILD_SYSTEM_USER`. This means: * A user with `prebuilt_workspace.delete` permission is allowed to delete only prebuilt workspaces. * A user with `workspace.delete` permission can delete both normal and prebuilt workspaces. ⚠️ This implementation is scoped to deletion operations only. No other operations are currently supported for the `prebuilt_workspace` resource. To delete a workspace, users must have the following permissions: * `workspace.read`: to read the current workspace state * `update`: to modify workspace metadata and related resources during deletion (e.g., updating the `deleted` field in the database) * `delete`: to perform the actual deletion of the workspace ## Changes * Introduced `authorizeWorkspace()` helper to handle prebuilt workspace authorization logic. * Ensured both `prebuilt_workspace` and `workspace` permissions are checked. * Added comments to clarify the current behavior and limitations. * Moved `SystemUserID` constant from the `prebuilds` package to the `database` package `PrebuildsSystemUserID` to resolve an import cycle (commit `f24e4ab4b6`). * Update middleware `ExtractOrganizationMember` to include system user members.	2025-06-20 17:36:32 +01:00
Yevhenii Shcherbina	8e3022ed9e	docs: add documentation for prebuild scheduling feature (#18462 ) Follow-up to https://github.com/coder/coder/pull/18126 Changes: - address issue mentioned here: https://github.com/coder/coder/pull/18126#discussion_r2144557600 - add docs for prebuilds scheduling --------- Co-authored-by: Danny Kopping <danny@coder.com> Co-authored-by: Atif Ali <atif@coder.com>	2025-06-20 10:08:47 -04:00
Yevhenii Shcherbina	0f6ca55238	feat: implement scheduling mechanism for prebuilds (#18126 ) Closes https://github.com/coder/internal/issues/312 Depends on https://github.com/coder/terraform-provider-coder/pull/408 This PR adds support for defining an autoscaling block for prebuilds, allowing number of desired instances to scale dynamically based on a schedule. Example usage: ``` data "coder_workspace_preset" "us-nix" { ... prebuilds = { instances = 0 # default to 0 instances scheduling = { timezone = "UTC" # a single timezone is used for simplicity # Scale to 3 instances during the work week schedule { cron = "* 8-18 * * 1-5" # from 8AM–6:59PM, Mon–Fri, UTC instances = 3 # scale to 3 instances } # Scale to 1 instance on Saturdays for urgent support queries schedule { cron = "* 8-14 * * 6" # from 8AM–2:59PM, Sat, UTC instances = 1 # scale to 1 instance } } } } ``` ### Behavior - Multiple `schedule` blocks per `prebuilds` block are supported. - If the current time matches any defined autoscaling schedule, the corresponding number of instances is used. - If no schedule matches, the default instance count (`prebuilds.instances`) is used as a fallback. ### Why This feature allows prebuild instance capacity to adapt to predictable usage patterns, such as: - Scaling up during business hours or high-demand periods - Reducing capacity during off-hours to save resources ### Cron specification The cron specification is interpreted as a continuous time range. For example, the expression: ``` * 9-18 * * 1-5 ``` is intended to represent a continuous range from 09:00 to 18:59, Monday through Friday. However, due to minor implementation imprecision, it is currently interpreted as a range from 08:59:00 to 18:58:59, Monday through Friday. This slight discrepancy arises because the evaluation is based on whether a specific point in time falls within the range, using the `github.com/coder/coder/v2/coderd/schedule/cron` library, which performs per-minute matching rather than strict range evaluation. --------- Co-authored-by: Danny Kopping <danny@coder.com>	2025-06-19 11:08:48 -04:00
Yevhenii Shcherbina	b330c0803c	fix: reimplement reporting of preset-hard-limited metric (#18055 ) Addresses concerns raised in https://github.com/coder/coder/pull/18045	2025-05-28 14:18:32 -04:00
Susana Ferreira	6f6e73af03	feat: implement expiration policy logic for prebuilds (#17996 ) ## Summary This PR introduces support for expiration policies in prebuilds. The TTL (time-to-live) is retrieved from the Terraform configuration ([terraform-provider-coder PR](https://github.com/coder/terraform-provider-coder/pull/404)): ``` prebuilds = { instances = 2 expiration_policy { ttl = 86400 } } ``` Note: Since there is no need for precise TTL enforcement down to the second, in this implementation expired prebuilds are handled in a single reconciliation cycle: they are deleted, and new instances are created only if needed to match the desired count. ## Changes * The outcome of a reconciliation cycle is now expressed as a slice of reconciliation actions, instead of a single aggregated action. * Adjusted reconciliation logic to delete expired prebuilds and guarantee that the number of desired instances is correct. * Updated relevant data structures and methods to support expiration policies parameters. * Added documentation to `Prebuilt workspaces` page * Update `terraform-provider-coder` to version 2.5.0: https://github.com/coder/terraform-provider-coder/releases/tag/v2.5.0 Depends on: https://github.com/coder/terraform-provider-coder/pull/404 Fixes: https://github.com/coder/coder/issues/17916	2025-05-26 20:31:24 +01:00
Yevhenii Shcherbina	53e8e9c7cd	fix: reduce cost of prebuild failure (#17697 ) Relates to https://github.com/coder/coder/issues/17432 ### Part 1: Notes: - `GetPresetsAtFailureLimit` SQL query is added, which is similar to `GetPresetsBackoff`, they use same CTEs: `filtered_builds`, `time_sorted_builds`, but they are still different. - Query is executed on every loop iteration. We can consider marking specific preset as permanently failed as an optimization to avoid executing query on every loop iteration. But I decided don't do it for now. - By default `FailureHardLimit` is set to 3. - `FailureHardLimit` is configurable. Setting it to zero - means that hard limit is disabled. ### Part 2 Notes: - `PrebuildFailureLimitReached` notification is added. - Notification is sent to template admins. - Notification is sent only the first time, when hard limit is reached. But it will `log.Warn` on every loop iteration. - I introduced this enum: ```sql CREATE TYPE prebuild_status AS ENUM ( 'normal', -- Prebuilds are working as expected; this is the default, healthy state. 'hard_limited', -- Prebuilds have failed repeatedly and hit the configured hard failure limit; won't be retried anymore. 'validation_failed' -- Prebuilds failed due to a non-retryable validation error (e.g. template misconfiguration); won't be retried. ); ``` `validation_failed` not used in this PR, but I think it will be used in next one, so I wanted to save us an extra migration. - Notification looks like this: <img width="472" alt="image" src="https://github.com/user-attachments/assets/e10efea0-1790-4e7f-a65c-f94c40fced27" /> ### Latest notification views: <img width="463" alt="image" src="https://github.com/user-attachments/assets/11310c58-68d1-4075-a497-f76d854633fe" /> <img width="725" alt="image" src="https://github.com/user-attachments/assets/6bbfe21a-91ac-47c3-a9d1-21807bb0c53a" />	2025-05-21 15:16:38 -04:00
Danny Kopping	6e967780c9	feat: track resource replacements when claiming a prebuilt workspace (#17571 ) Closes https://github.com/coder/internal/issues/369 We can't know whether a replacement (i.e. drift of terraform state leading to a resource needing to be deleted/recreated) will take place apriori; we can only detect it at `plan` time, because the provider decides whether a resource must be replaced and it cannot be inferred through static analysis of the template. This is likely to be the most common gotcha with using prebuilds, since it requires a slight template modification to use prebuilds effectively, so let's head this off before it's an issue for customers. Drift details will now be logged in the workspace build logs: ![image](https://github.com/user-attachments/assets/da1988b6-2cbe-4a79-a3c5-ea29891f3d6f) Plus a notification will be sent to template admins when this situation arises: ![image](https://github.com/user-attachments/assets/39d555b1-a262-4a3e-b529-03b9f23bf66a) A new metric - `coderd_prebuilt_workspaces_resource_replacements_total` - will also increment each time a workspace encounters replacements. We only track _that_ a resource replacement occurred, not how many. Just one is enough to ruin a prebuild, but we can't know apriori which replacement would cause this. For example, say we have 2 replacements: a `docker_container` and a `null_resource`; we don't know which one might cause an issue (or indeed if either would), so we just track the replacement. --------- Signed-off-by: Danny Kopping <dannykopping@gmail.com>	2025-05-14 14:52:22 +02:00
Sas Swart	425ee6fa55	feat: reinitialize agents when a prebuilt workspace is claimed (#17475 ) This pull request allows coder workspace agents to be reinitialized when a prebuilt workspace is claimed by a user. This facilitates the transfer of ownership between the anonymous prebuilds system user and the new owner of the workspace. Only a single agent per prebuilt workspace is supported for now, but plumbing has already been done to facilitate the seamless transition to multi-agent support. --------- Signed-off-by: Danny Kopping <dannykopping@gmail.com> Co-authored-by: Danny Kopping <dannykopping@gmail.com>	2025-05-14 14:15:36 +02:00
Yevhenii Shcherbina	02b2de9ae4	refactor: skip reconciliation for some presets (#17595 )	2025-04-29 07:55:37 -04:00
Yevhenii Shcherbina	a78f0fc4e1	refactor: use specific error for agpl and prebuilds (#17591 ) Follow-up PR to https://github.com/coder/coder/pull/17458 Addresses this discussion: https://github.com/coder/coder/pull/17458#discussion_r2055940797	2025-04-28 16:37:41 -04:00
Danny Kopping	e0483e3136	feat: add prebuilds metrics collector (#17547 ) Closes https://github.com/coder/internal/issues/509 --------- Signed-off-by: Danny Kopping <dannykopping@gmail.com>	2025-04-28 12:28:56 +02:00
Danny Kopping	08ad910171	feat: add prebuilds configuration & bootstrapping (#17527 ) Closes https://github.com/coder/internal/issues/508 --------- Signed-off-by: Danny Kopping <dannykopping@gmail.com> Co-authored-by: Cian Johnston <cian@coder.com>	2025-04-25 11:07:15 +02:00
Yevhenii Shcherbina	118f12ac3a	feat: implement claiming of prebuilt workspaces (#17458 ) Signed-off-by: Danny Kopping <dannykopping@gmail.com> Co-authored-by: Danny Kopping <dannykopping@gmail.com> Co-authored-by: Danny Kopping <danny@coder.com> Co-authored-by: Edward Angert <EdwardAngert@users.noreply.github.com> Co-authored-by: EdwardAngert <17991901+EdwardAngert@users.noreply.github.com> Co-authored-by: Jaayden Halko <jaayden.halko@gmail.com> Co-authored-by: Ethan <39577870+ethanndickson@users.noreply.github.com> Co-authored-by: M Atif Ali <atif@coder.com> Co-authored-by: Aericio <16523741+Aericio@users.noreply.github.com> Co-authored-by: M Atif Ali <me@matifali.dev> Co-authored-by: Michael Suchacz <203725896+ibetitsmike@users.noreply.github.com>	2025-04-24 09:39:38 -04:00
Yevhenii Shcherbina	183146e2c9	fix: add minor fix to reconciliation loop (#17454 ) Follow-up PR to https://github.com/coder/coder/pull/17261 I noticed that 1 metrics-related test fails in `dk/prebuilds` after merging my PR into `dk/prebuilds`.	2025-04-17 13:43:24 -04:00
Yevhenii Shcherbina	27bc60d1b9	feat: implement reconciliation loop (#17261 ) Closes https://github.com/coder/internal/issues/510 <details> <summary> Refactoring Summary </summary> ### 1) `CalculateActions` Function #### Issues Before Refactoring: - Large function (~150 lines), making it difficult to read and maintain. - The control flow is hard to follow due to complex conditional logic. - The `ReconciliationActions` struct was partially initialized early, then mutated in multiple places, making the flow error-prone. Original source: `fe60b569ad/coderd/prebuilds/state.go (L13-L167)` #### Improvements After Refactoring: - Simplified and broken down into smaller, focused helper methods. - The flow of the function is now more linear and easier to understand. - Struct initialization is cleaner, avoiding partial and incremental mutations. Refactored function: `eeb0407d78/coderd/prebuilds/state.go (L67-L84)` --- ### 2) `ReconciliationActions` Struct #### Issues Before Refactoring: - The struct mixed both actionable decisions and diagnostic state, which blurred its purpose. - It was unclear which fields were necessary for reconciliation logic, and which were purely for logging/observability. #### Improvements After Refactoring: - Split into two clear, purpose-specific structs: - `ReconciliationActions` — defines the intended reconciliation action. - `ReconciliationState` — captures runtime state and metadata, primarily for logging and diagnostics. Original struct: `fe60b569ad/coderd/prebuilds/reconcile.go (L29-L41)` </details> --------- Signed-off-by: Danny Kopping <dannykopping@gmail.com> Co-authored-by: Sas Swart <sas.swart.cdk@gmail.com> Co-authored-by: Danny Kopping <dannykopping@gmail.com> Co-authored-by: Dean Sheather <dean@deansheather.com> Co-authored-by: Spike Curtis <spike@coder.com> Co-authored-by: Danny Kopping <danny@coder.com>	2025-04-17 09:29:29 -04:00
Danny Kopping	4c33846f6d	chore: add prebuilds system user (#16916 ) Pre-requisite for https://github.com/coder/coder/pull/16891 Closes https://github.com/coder/internal/issues/515 This PR introduces a new concept of a "system" user. Our data model requires that all workspaces have an owner (a `users` relation), and prebuilds is a feature that will spin up workspaces to be claimed later by actual users - and thus needs to own the workspaces in the interim. Naturally, introducing a change like this touches a few aspects around the codebase and we've taken the approach _default hidden_ here; in other words, queries for users will by default _exclude_ all system users, but there is a flag to ensure they can be displayed. This keeps the changeset relatively small. This user has minimal permissions (it's equivalent to a `member` since it has no roles). It will be associated with the default org in the initial migration, and thereafter we'll need to somehow ensure its membership aligns with templates (which are org-scoped) for which it'll need to provision prebuilds; that's a solution we'll have in a subsequent PR. --------- Signed-off-by: Danny Kopping <dannykopping@gmail.com> Co-authored-by: Sas Swart <sas.swart.cdk@gmail.com>	2025-03-25 12:18:06 +00:00

17 Commits