Commit Graph

2657 Commits

Author SHA1 Message Date
f0251dfc91 chore: retry postgres connection on reset by peer in tests (#18632)
Fixes https://github.com/coder/internal/issues/695

Retries initial connection to postgres in testing up to 3 seconds if we
see "reset by peer", which probably means that some other test proc just
started the container.

---------

Co-authored-by: Hugo Dutka <hugo@coder.com>
2025-06-27 13:03:32 +00:00
3cb9b20b11 chore: improve rbac and add benchmark tooling (#18584)
## Description

This PR improves the RBAC package by refactoring the policy, enhancing
documentation, and adding utility scripts.

## Changes

* Refactored `policy.rego` for clarity and readability
* Updated README with OPA section
* Added `benchmark_authz.sh` script for authz performance testing and
comparison
* Added `gen_input.go` to generate input for `opa eval` testing
2025-06-27 12:05:34 +01:00
09cc906981 chore: remove unnecessary redeclarations in for loops (part 2) (#18593) 2025-06-26 12:28:00 -06:00
c6e0ba12d3 feat: graduate prebuilds to general availability (#18607)
This PR removes the prebuilds experiment and allows the use of prebuilds
without opting into an experiment.
2025-06-26 15:54:52 +02:00
f2d229eed3 fix!: use devcontainer ID when rebuilding a devcontainer (#18604)
This PR replaces the use of the **container** ID with the
**devcontainer** ID. This is a breaking change. This allows rebuilding a
devcontainer when there is no valid container ID.
2025-06-26 11:41:57 +01:00
6c713d5c20 fix(coderd/agentapi): make sub agent slugs more unique (#18581)
The incorrect assumption that slugs were unique per-agent was made when
the subagent API was implemented. Whilst this PR doesn't completely
enforce that, we instead compute a stable hash to prefix the slug that
should provide a reasonable level of probability that the slug will be
unique.
2025-06-25 17:36:23 +01:00
e396b06c25 feat: allow new immutable parameters for existing workspaces (#18579)
Closes https://github.com/coder/coder/issues/18578
2025-06-25 15:41:53 +00:00
688d2ee3eb chore: remove chats experiment (#18535) 2025-06-25 13:03:32 +00:00
c4e4fe85f9 fix(agent): start devcontainers through agentcontainers package (#18471)
Fixes https://github.com/coder/internal/issues/706

Context for the implementation here
https://github.com/coder/internal/issues/706#issuecomment-2990490282

Synchronously starts dev containers defined in terraform with our
`DevcontainerCLI` abstraction, instead of piggybacking off of our
`agentscripts` package. This gives us more control over logs, instead of
being reliant on packages which may or may not exist in the
user-provided image.
2025-06-25 11:52:50 +01:00
e443f8624d feat(agent/agentcontainers): implement ignore customization for devcontainers (#18530)
Fixes coder/internal#737
2025-06-24 18:52:05 +00:00
06c997a100 chore: make telemetry use_classic_parameter_flow nullable (#18547) 2025-06-24 13:38:12 -05:00
99d124e276 feat(agent): enable devcontainers by default (#18533) 2025-06-24 21:17:04 +03:00
e5eb2a8322 fix: prebuild user without ssh key when fetching owner ctx (#18541) 2025-06-24 16:49:36 +00:00
f44969b689 chore: reorder prebuilt workspace authorization logic (#18506)
## Description

Follow-up from PR https://github.com/coder/coder/pull/18333
Related with:
https://github.com/coder/coder/pull/18333#discussion_r2159300881

This changes the authorization logic to first try the normal workspace
authorization check, and only if the resource is a prebuilt workspace,
fall back to the prebuilt workspace authorization check. Since prebuilt
workspaces are a subset of workspaces, the normal workspace check is
more likely to succeed. This is a small optimization to reduce
unnecessary prebuilt authorization calls.
2025-06-24 16:33:21 +01:00
341b54e604 fix: allow dynamic parameters without requiring org membership (#18531) 2025-06-24 10:33:10 -05:00
a4f1c64a9b fix: allow dynamic parameters to consider the prebuilds user an owner (#18529)
This Pull request allows dynamic parameters to list system users in its
search for workspace owners. This is necessary to allow prebuilds to
reconcile prebuilt workspaces and to delete them.
2025-06-24 16:47:01 +02:00
4ff2254e5f chore: remove ai tasks from experiment (#18511)
Closes https://github.com/coder/internal/issues/661
2025-06-24 16:24:01 +02:00
45ab265df2 chore: add permissions to autobuilder & prebuilder to run wsbuild (#18527)
Read organization member and read files is now required for dynamic
param building.
2025-06-24 08:45:41 -05:00
7b152cdd91 chore: increase fileCache hit rate in autobuilds lifecycle (#18507)
`wsbuilder` hits the file cache when running validation. This solution is imperfect, but by first sorting workspaces by their template version id, the cache hit rate should improve.
2025-06-24 07:36:39 -05:00
670fa4a3cc feat: add the /aitasks/prompts endpoint (#18464)
Add an endpoint to fetch AI task prompts for multiple workspace builds
at the same time. A prompt is the value of the "AI Prompt" workspace
build parameter. On main, the only way our API allows fetching workspace
build parameters is by using the `/workspacebuilds/$build_id/parameters`
endpoint, requiring a separate API call for every build.

The Tasks dashboard fetches Task workspaces in order to show them in a
list, and then needs to fetch the value of the `AI Prompt` parameter for
every task workspace (using its latest build id), requiring an
additional API call for each list item. This endpoint will allow the
dashboard to make just 2 calls to render the list: one to fetch task
workspaces, the other to fetch prompts.

<img width="1512" alt="Screenshot 2025-06-20 at 11 33 11"
src="https://github.com/user-attachments/assets/92899999-e922-44c5-8325-b4b23a0d2bff"
/>

Related to https://github.com/coder/internal/issues/660.
2025-06-24 13:06:02 +02:00
0238f2926d feat: persist AI task state in template imports & workspace builds (#18449) 2025-06-24 10:36:37 +00:00
6cc4cfa346 feat: allow for default presets (#18445) 2025-06-24 12:19:19 +02:00
2afd1a203e chore: disable devtunnel tests on windows (#18521) 2025-06-24 19:01:29 +10:00
d892427b78 fix: do not warn on valid known experiments (#18514)
Fixes https://github.com/coder/coder/issues/18024

* drive-by: renames `handleExperimentsSafe` to
`handleExperimentsAvailable` to better match semantics
* defines list of `codersdk.ExperimentsKnown` and updates
`ReadExperiments` to log on invalid experiments
* typescript-ignores `codersdk.Experiments` so apitypings generates a
valid enum list of possible values of experiment
* updates OverviewPageView to distinguish between known 'hidden'
experiments and unknown 'invalid' experiments
2025-06-24 09:14:41 +01:00
5ed0c7abcb chore: improve dynamic parameter validation errors (#18501)
`BuildError` response from `wsbuilder` does not support rich errors from validation. Changed this to use the `Validations` block of codersdk responses to return all errors for invalid parameters.
2025-06-23 15:08:18 -05:00
f6e4ba6ed9 chore: remove per request dynamic parameters opt in and rely on template (#18505)
When in experimental this was used as an escape hatch. Removed to be
consistent with the template author's intentions

Backwards compatible, removing an experimental api field that is no longer used.
2025-06-23 15:04:09 -05:00
4699393522 fix: upsert coder_app resources in case they are persistent (#18509) 2025-06-23 18:50:44 +00:00
82af2e019d feat: implement dynamic parameter validation (#18482)
# What does this do?

This does parameter validation for dynamic parameters in `wsbuilder`. All input parameters are validated in `coder/coder` before being sent to terraform.

The heart of this PR is [`ResolveParameters`](b65001e89c/coderd/dynamicparameters/resolver.go (L30-L30)).

# What else changes?

`wsbuilder` now needs to load the terraform files into memory to succeed. This does add a larger memory requirement to workspace builds.

# Future work

- Sort autostart handling workspaces by template version id. So workspaces with the same template version only load the terraform files once from the db, and store them in the cache.
2025-06-23 12:35:15 -05:00
7254c08af4 chore: remove parallel queries in the same transaction (#18489)
Parallel concurrent queries cannot be run in the same tx

Was getting this error: https://stackoverflow.com/questions/78472996/go-postgres-pq-unexpected-parse-response-c-with-queryrow
2025-06-23 12:17:58 -05:00
c1b35bf2f6 chore: use database in current context for file cache (#18490)
Using the db.Store when in a TX causes a deadlock for dbmem.
In production, this can cause a deadlock if at the current conn pool
limit.
2025-06-23 11:58:52 -05:00
659b787b9f chore: set wsbuilder to use preview parameters (#18474)
Use richer `previewtypes.Parameter` for `wsbuilder`. This is a pre-requirement to adding dynamic parameter validation.

The richer type contains more information than the `db` parameter, so the conversion is lossless.
2025-06-23 11:31:53 -05:00
2f55e29466 fix: complete job and mark workspace as deleted when no provisioners are available (#18465)
Alternate fix for https://github.com/coder/coder/issues/18080

Modifies wsbuilder to complete the provisioner job and mark the
workspace as deleted if it is clear that no provisioner will be able to
pick up the delete build.

This has a significant advantage of not deviating too much from the
current semantics of `POST /api/v2/workspacebuilds`.
https://github.com/coder/coder/pull/18460 ends up returning a 204 on
orphan delete due to no build being created.

Downside is that we have to duplicate some responsibilities of
provisionerdserver in wsbuilder.

There is a slight gotcha to this approach though: if you stop a
provisioner and then immediately try to orphan-delete, the job will
still be created because of the provisioner heartbeat interval. However
you can cancel it and try again.
2025-06-23 14:07:42 +01:00
66e8dbbe17 feat: persist generated coder_app id (#18487) 2025-06-23 08:46:18 +00:00
49fcffc266 fix!: stop workspace before update (#18425)
Fixes https://github.com/coder/coder/issues/17840

NOTE: calling this out as a breaking change so that it is highly visible
in the changelog.

* CLI: Modifies `coder update` to stop the workspace if already running.
* UI: Modifies "update" button to always stop the workspace if already
running.
2025-06-23 09:12:37 +01:00
0a483ea2b7 feat: add idle app status (#18415)
"Idle" is more accurate than "complete" since:

1. AgentAPI only knows if the screen is active; it has no way of knowing
    if the task is complete.
2. The LLM might be done with its current prompt, but that does not mean
    the task is complete either (it likely needs refinement).

The "complete" state will be reserved for future definition.

Additionally, in the case where the screen goes idle but the LLM never
reported a status update, we can get an idle icon without a message, and
it looks kinda janky in the UI so if there is no message I display the
state text.

Closes https://github.com/coder/internal/issues/699
2025-06-20 14:34:31 -08:00
6e4508e29c chore: assume template versions without tf values to be empty (#18479)
Closes https://github.com/coder/internal/issues/735
2025-06-20 15:05:22 -05:00
fae30a00fd chore: remove unnecessary redeclarations in for loops (#18440) 2025-06-20 13:16:55 -06:00
556b095d0f chore: add cacheCloser to cleanup all opened files (#18473) 2025-06-20 13:25:33 -05:00
9b5d49967c chore: refactor dynamic parameters into dedicated package (#18420)
This PR extracts dynamic parameter rendering logic from
coderd/parameters.go into a new coderd/dynamicparameters package. Partly
for organization and maintainability, but primarily to be reused in
`wsbuilder` to be leveraged as validation.
2025-06-20 13:00:39 -05:00
72f7d70bab feat: allow TemplateAdmin to delete prebuilds via auth layer (#18333)
## Description

This PR adds support for deleting prebuilt workspaces via the
authorization layer. It introduces special-case handling to ensure that
`prebuilt_workspace` permissions are evaluated when attempting to delete
a prebuilt workspace, falling back to the standard `workspace` resource
as needed.

Prebuilt workspaces are a subset of workspaces, identified by having
`owner_id` set to `PREBUILD_SYSTEM_USER`.
This means:
* A user with `prebuilt_workspace.delete` permission is allowed to
**delete only prebuilt workspaces**.
* A user with `workspace.delete` permission can **delete both normal and
prebuilt workspaces**.

⚠️ This implementation is scoped to **deletion operations only**. No
other operations are currently supported for the `prebuilt_workspace`
resource.

To delete a workspace, users must have the following permissions:
* `workspace.read`: to read the current workspace state
* `update`: to modify workspace metadata and related resources during
deletion (e.g., updating the `deleted` field in the database)
* `delete`: to perform the actual deletion of the workspace

## Changes

* Introduced `authorizeWorkspace()` helper to handle prebuilt workspace
authorization logic.
* Ensured both `prebuilt_workspace` and `workspace` permissions are
checked.
* Added comments to clarify the current behavior and limitations.
* Moved `SystemUserID` constant from the `prebuilds` package to the
`database` package `PrebuildsSystemUserID` to resolve an import cycle
(commit
f24e4ab4b6).
* Update middleware `ExtractOrganizationMember` to include system user
members.
2025-06-20 17:36:32 +01:00
8e3022ed9e docs: add documentation for prebuild scheduling feature (#18462)
Follow-up to https://github.com/coder/coder/pull/18126

Changes:
- address issue mentioned here:
https://github.com/coder/coder/pull/18126#discussion_r2144557600
- add docs for prebuilds scheduling

---------

Co-authored-by: Danny Kopping <danny@coder.com>
Co-authored-by: Atif Ali <atif@coder.com>
2025-06-20 10:08:47 -04:00
da5d5ba96a fix: implement prebuild schedules methods for dbmem (#18469)
Follow-up to https://github.com/coder/coder/pull/18126
2025-06-20 10:06:06 -04:00
32239b29cb chore: add AI-tasks-specific fields to codersdk.WorkspaceBuild (#18436)
This will be needed by the frontend on the `/task/$id` page to display
the app in the sidebar.

Related to https://github.com/coder/coder/issues/18158
2025-06-20 10:59:34 +02:00
0f6ca55238 feat: implement scheduling mechanism for prebuilds (#18126)
Closes https://github.com/coder/internal/issues/312
Depends on https://github.com/coder/terraform-provider-coder/pull/408

This PR adds support for defining an **autoscaling block** for
prebuilds, allowing number of desired instances to scale dynamically
based on a schedule.

Example usage:
```
data "coder_workspace_preset" "us-nix" {
  ...
  
  prebuilds = {
    instances = 0                  # default to 0 instances
    
    scheduling = {
      timezone = "UTC"             # a single timezone is used for simplicity
      
      # Scale to 3 instances during the work week
      schedule {
        cron = "* 8-18 * * 1-5"    # from 8AM–6:59PM, Mon–Fri, UTC
        instances = 3              # scale to 3 instances
      }
      
      # Scale to 1 instance on Saturdays for urgent support queries
      schedule {
        cron = "* 8-14 * * 6"      # from 8AM–2:59PM, Sat, UTC
        instances = 1              # scale to 1 instance
      }
    }
  }
}
```

### Behavior
- Multiple `schedule` blocks per `prebuilds` block are supported.
- If the current time matches any defined autoscaling schedule, the
corresponding number of instances is used.
- If no schedule matches, the **default instance count**
(`prebuilds.instances`) is used as a fallback.

### Why
This feature allows prebuild instance capacity to adapt to predictable
usage patterns, such as:
- Scaling up during business hours or high-demand periods
- Reducing capacity during off-hours to save resources

### Cron specification
The cron specification is interpreted as a **continuous time range.**

For example, the expression:

```
* 9-18 * * 1-5
```

is intended to represent a continuous range from **09:00 to 18:59**,
Monday through Friday.

However, due to minor implementation imprecision, it is currently
interpreted as a range from **08:59:00 to 18:58:59**, Monday through
Friday.

This slight discrepancy arises because the evaluation is based on
whether a specific **point in time** falls within the range, using the
`github.com/coder/coder/v2/coderd/schedule/cron` library, which performs
per-minute matching rather than strict range evaluation.

---------

Co-authored-by: Danny Kopping <danny@coder.com>
2025-06-19 11:08:48 -04:00
511fd09582 fix(coderd): mark sub agent deletion via boolean instead of delete (#18411)
Deletion of data is uncommon in our database, so the introduction of sub agents
and the deletion of them introduced issues with foreign key assumptions, as can
be seen in coder/internal#685. We could have only addressed the specific case by
allowing cascade deletion of stats as well as handling in the stats collector,
but it's unclear how many more such edge-cases we could run into.

In this change, we mark the rows as deleted via boolean instead, and filter them
out in all relevant queries.

Fixes coder/internal#685
2025-06-19 13:32:51 +00:00
8b27983d14 fix: fix TestAcquireJobWithCancel_Cancel flake (#18441) 2025-06-18 22:51:13 -04:00
b0fa3275d2 fix: increase TestAcquireJob_LongPoll timeout to prevent flakiness (#18442)
I'll be honest I'm not even really sure the point of this test but it
was failing due to

```
2025-06-16T15:01:54.0863251Z         	Error:      	Received unexpected error:
2025-06-16T15:01:54.0863554Z         	            	acquire job:
2025-06-16T15:01:54.0864230Z         	            	    github.com/coder/coder/v2/coderd/provisionerdserver.(*server).AcquireJob
2025-06-16T15:01:54.0865173Z         	            	        /home/runner/work/coder/coder/coderd/provisionerdserver/provisionerdserver.go:329
2025-06-16T15:01:54.0865683Z         	            	  - failed to acquire job:
2025-06-16T15:01:54.0866374Z         	            	    github.com/coder/coder/v2/coderd/provisionerdserver.(*Acquirer).AcquireJob
2025-06-16T15:01:54.0867262Z         	            	        /home/runner/work/coder/coder/coderd/provisionerdserver/acquirer.go:148
2025-06-16T15:01:54.0867819Z         	            	  - pq: canceling statement due to user request
```

which is certainly unintended.
2025-06-19 02:50:53 +00:00
04d202ae07 chore: file cache Release tied 1:1 with an acquire (#18410)
File cache close made idempotent
2025-06-18 18:22:23 -05:00
de07351b8d fix: access the templateVersion.HasAITask field properly (#18434) 2025-06-18 17:23:34 +00:00
8f6a5afa4f feat: add backend logic for determining tasks tab visibility (#18401)
This PR implements the backend logic for determining if the Tasks tab
should be visible in the web UI as described in [the
RFC](https://www.notion.so/coderhq/Coder-Tasks-207d579be5928053ab68c8d9a4b59eaa?source=copy_link#210d579be5928013ab5acbe69a2f548b).

The frontend component will be added in a follow-up PR once the entire
Tasks backend is implemented so as not to break the dogfood environment
until then.
2025-06-18 18:32:34 +02:00