Commit Graph

2604 Commits

Author SHA1 Message Date
e4648b6fc1 feat: allow iframing urls on the same domain as the deployment (#18102)
Used for AI tasks. We should eventually add regions to this csp header.
2025-05-29 10:07:57 -05:00
8387dd27ab chore: add form_type parameter argument to db (#17920)
`form_type` is a new parameter field in the terraform provider. Bring
that field into coder/coder.

Validation for `multi-select` has also been added.
2025-05-29 08:55:19 -05:00
776c144128 fix(coderd): ensure agent timings are non-zero on insert (#18065)
Relates to https://github.com/coder/coder/issues/15432

Ensures that no workspace build timings with zero values for started_at or ended_at are inserted into the DB or returned from the API.
2025-05-29 13:36:06 +01:00
b712d0b23f feat(coderd/agentapi): implement sub agent api (#17823)
Closes https://github.com/coder/internal/issues/619

Implement the `coderd` side of the AgentAPI for the upcoming
dev-container agents work.

`agent/agenttest/client.go` is left unimplemented for a future PR
working to implement the agent side of this feature.
2025-05-29 12:15:47 +01:00
bc83de2a72 feat: add prebuilt workspaces telemetry (#18084)
Adds telemetry for a _global_ account of prebuilt workspaces created,
failed to build, and claimed.

Partitioning this data by template/preset tuple is not currently in
scope.

---------

Signed-off-by: Danny Kopping <dannykopping@gmail.com>
2025-05-29 13:13:44 +02:00
2ec7404197 chore: make owner_name and owner_username consistent (#18081)
We've been using owner_name inconsistently as username. So this PR fixes
it by making the attribute naming more consistent.
2025-05-28 17:25:32 -03:00
b330c0803c fix: reimplement reporting of preset-hard-limited metric (#18055)
Addresses concerns raised in https://github.com/coder/coder/pull/18045
2025-05-28 14:18:32 -04:00
ca8660cea6 chore: keep previous workspace build parameters for dynamic params (#18059)
The existing code persists all static parameters and their values. Using
the previous build as the source if no new inputs are found.

Dynamic params do not have a state of the parameters saved to disk. So
instead, all previous values are persisted always, and new inputs
override.
2025-05-28 10:00:39 -05:00
6e255c72c6 chore(coderd/database): enforce agent name unique within workspace build (#18052)
Adds a database trigger that runs on insert and update of the
`workspace_agents` table. The trigger ensures that the agent name is
unique within the context of the workspace build it is being inserted
into.
2025-05-28 14:21:17 +01:00
110102a60a fix: optimize queue position sql query (#17974)
Use only `online provisioner daemons` for
`GetProvisionerJobsByIDsWithQueuePosition` query. It should improve
performance of the query.
2025-05-28 08:21:16 -04:00
b4531c4218 feat: make dynamic parameters respect owner in form (#18013)
Closes https://github.com/coder/coder/issues/18012

---------

Co-authored-by: Jaayden Halko <jaayden.halko@gmail.com>
2025-05-27 15:43:00 -05:00
9fc3329575 feat: persist app groups in the database (#17977) 2025-05-27 13:13:08 -06:00
a18eb9d08f feat(site): allow recreating devcontainers and showing dirty status (#18049)
This change allows showing the devcontainer dirty status in the UI as
well as a recreate button to update the devcontainer.

Closes #16424
2025-05-27 19:42:24 +03:00
d63417b542 fix: update WorkspaceOwnerName to use user.name instead of user.username (#18025)
We have been using the user.username instead of user.name in wrong
places, making it very confusing for the UI.
2025-05-27 11:42:07 -03:00
9827c97f32 feat: add AI Tasks page (#18047)
**Preview:**

<img width="1624" alt="Screenshot 2025-05-26 at 21 25 04"
src="https://github.com/user-attachments/assets/2a51915d-2527-4467-bf99-1f2d876b953b"
/>
2025-05-27 11:34:07 -03:00
6c0bed0f53 chore: update to coder/quartz v0.2.0 (#18007)
Upgrade to coder/quartz v0.2.0 including fixing up a minor API breaking change.
2025-05-27 16:05:03 +04:00
6f6e73af03 feat: implement expiration policy logic for prebuilds (#17996)
## Summary 

This PR introduces support for expiration policies in prebuilds. The TTL
(time-to-live) is retrieved from the Terraform configuration
([terraform-provider-coder
PR](https://github.com/coder/terraform-provider-coder/pull/404)):
```
prebuilds = {
	  instances = 2
	  expiration_policy {
		  ttl = 86400
	  }
  }
```
**Note**: Since there is no need for precise TTL enforcement down to the
second, in this implementation expired prebuilds are handled in a single
reconciliation cycle: they are deleted, and new instances are created
only if needed to match the desired count.

## Changes

* The outcome of a reconciliation cycle is now expressed as a slice of
reconciliation actions, instead of a single aggregated action.
* Adjusted reconciliation logic to delete expired prebuilds and
guarantee that the number of desired instances is correct.
* Updated relevant data structures and methods to support expiration
policies parameters.
* Added documentation to `Prebuilt workspaces` page
* Update `terraform-provider-coder` to version 2.5.0:
https://github.com/coder/terraform-provider-coder/releases/tag/v2.5.0

Depends on: https://github.com/coder/terraform-provider-coder/pull/404
Fixes: https://github.com/coder/coder/issues/17916
2025-05-26 20:31:24 +01:00
0731304905 feat(agent/agentcontainers): recreate devcontainers concurrently (#18042)
This change introduces a refactor of the devcontainers recreation logic
which is now handled asynchronously rather than being request scoped.
The response was consequently changed from "No Content" to "Accepted" to
reflect this.

A new `Status` field was introduced to the devcontainer struct which
replaces `Running` (bool). This reflects that the devcontainer can now
be in various states (starting, running, stopped or errored).

The status field also protects against multiple concurrent recrations,
as long as they are initiated via the API.

Updates #16424
2025-05-26 18:30:52 +03:00
d6c14f3d8a feat(agent/agentcontainers): update containers periodically (#17972)
This change introduces a significant refactor to the agentcontainers API
and enables periodic updates of Docker containers rather than on-demand.
Consequently this change also allows us to move away from using a
locking channel and replace it with a mutex, which simplifies usage.

Additionally a previous oversight was fixed, and testing added, to clear
devcontainer running/dirty status when the container has been removed.

Updates coder/coder#16424
Updates coder/internal#621
2025-05-22 19:44:33 +03:00
f825477a5c fix: add timeouts to test telemetry snapshot (#17879)
This PR ensures that waits on channels will time out according to the
test context, rather than waiting indefinitely. This should alleviate
the panic seen in https://github.com/coder/internal/issues/645 and, if
the deadlock recurs, allow the test to be retried automatically in CI.
2025-05-22 13:51:24 +02:00
34494fb330 chore: avoid depending on rbac in slim builds (#17959)
I noticed the `coder-vpn.dylib` (of course alongside the Agent/CLI binaries) had grown substantially (from 29MB to 37MB for the dylib), and discovered that importing RBAC in slim builds was the issue

This PR removes the dependency on RBAC in slim builds, and adds a compile-time check to ensure it can't be imported in the future:

```
$ make build
# github.com/coder/coder/v2/coderd/rbac
coderd/rbac/no_slim.go:8:2: initialization cycle: _DO_NOT_IMPORT_THIS_PACKAGE_IN_SLIM_BUILDS refers to itself
make: *** [Makefile:224: build/coder-slim_2.22.1-devel+7e46d24b4_linux_amd64] Error 1
```

Before and after for `coder-slim_darwin_arm64`:
```
$ gsa before after
┌───────────────────────────────────────────────────────────────────────────────────┐
│ Diff between before and after                                                     │
├─────────┬─────────────────────────────────────────┬──────────┬──────────┬─────────┤
│ PERCENT │ NAME                                    │ OLD SIZE │ NEW SIZE │ DIFF    │
├─────────┼─────────────────────────────────────────┼──────────┼──────────┼─────────┤
│ -100%   │ github.com/gorilla/mux                  │          │          │ +0 B    │
│ -100%   │ github.com/ammario/tlru                 │          │          │ +0 B    │
│ -100%   │ github.com/armon/go-radix               │          │          │ +0 B    │
│ -0.00%  │ gvisor.dev/gvisor                       │ 2.4 MB   │ 2.4 MB   │ -4 B    │
│ -0.21%  │ os                                      │ 155 kB   │ 155 kB   │ -328 B  │
│ -0.23%  │ regexp                                  │ 152 kB   │ 152 kB   │ -346 B  │
│ -0.04%  │ runtime                                 │ 876 kB   │ 876 kB   │ -372 B  │
│ -100%   │ github.com/rcrowley/go-metrics          │ 675 B    │          │ -675 B  │
│ -23.79% │ github.com/cespare/xxhash/v2            │ 3.0 kB   │ 2.3 kB   │ -715 B  │
│ -100%   │ github.com/agnivade/levenshtein         │ 1.4 kB   │          │ -1.4 kB │
│ -100%   │ github.com/go-ini/ini                   │ 1.5 kB   │          │ -1.5 kB │
│ -100%   │ github.com/xeipuuv/gojsonreference      │ 2.4 kB   │          │ -2.4 kB │
│ -100%   │ github.com/xeipuuv/gojsonpointer        │ 5.2 kB   │          │ -5.2 kB │
│ -2.43%  │ go.opentelemetry.io/otel                │ 316 kB   │ 309 kB   │ -7.7 kB │
│ -2.40%  │ slices                                  │ 381 kB   │ 372 kB   │ -9.2 kB │
│ -0.68%  │ crypto                                  │ 1.4 MB   │ 1.4 MB   │ -9.5 kB │
│ -100%   │ github.com/tchap/go-patricia/v2         │ 23 kB    │          │ -23 kB  │
│ -100%   │ github.com/yashtewari/glob-intersection │ 28 kB    │          │ -28 kB  │
│ -4.35%  │ <autogenerated>                         │ 754 kB   │ 721 kB   │ -33 kB  │
│ -100%   │ github.com/sirupsen/logrus              │ 72 kB    │          │ -72 kB  │
│ -2.56%  │ github.com/coder/coder/v2               │ 3.3 MB   │ 3.2 MB   │ -84 kB  │
│ -100%   │ github.com/gobwas/glob                  │ 107 kB   │          │ -107 kB │
│ -100%   │ sigs.k8s.io/yaml                        │ 244 kB   │          │ -244 kB │
│ -100%   │ github.com/open-policy-agent/opa        │ 2.2 MB   │          │ -2.2 MB │
├─────────┼─────────────────────────────────────────┼──────────┼──────────┼─────────┤
│ -7.79%  │ __go_buildinfo __DATA                   │ 18 kB    │ 17 kB    │ -1.4 kB │
│ -6.81%  │ __itablink __DATA_CONST                 │ 23 kB    │ 22 kB    │ -1.6 kB │
│ -6.61%  │ __typelink __DATA_CONST                 │ 71 kB    │ 66 kB    │ -4.7 kB │
│ -2.86%  │ __noptrdata __DATA                      │ 1.0 MB   │ 993 kB   │ -29 kB  │
│ -21.49% │ __data __DATA                           │ 320 kB   │ 251 kB   │ -69 kB  │
│ -6.19%  │ __rodata __DATA_CONST                   │ 6.0 MB   │ 5.6 MB   │ -372 kB │
│ -47.19% │ __rodata __TEXT                         │ 7.6 MB   │ 4.0 MB   │ -3.6 MB │
├─────────┼─────────────────────────────────────────┼──────────┼──────────┼─────────┤
│ -14.02% │ before                                  │ 50 MB    │ 43 MB    │ -7.0 MB │
│         │ after                                   │          │          │         │
└─────────┴─────────────────────────────────────────┴──────────┴──────────┴─────────┘
```
2025-05-22 19:48:23 +10:00
53e8e9c7cd fix: reduce cost of prebuild failure (#17697)
Relates to https://github.com/coder/coder/issues/17432

### Part 1:

Notes:
- `GetPresetsAtFailureLimit` SQL query is added, which is similar to
`GetPresetsBackoff`, they use same CTEs: `filtered_builds`,
`time_sorted_builds`, but they are still different.

- Query is executed on every loop iteration. We can consider marking
specific preset as permanently failed as an optimization to avoid
executing query on every loop iteration. But I decided don't do it for
now.

- By default `FailureHardLimit` is set to 3.

- `FailureHardLimit` is configurable. Setting it to zero - means that
hard limit is disabled.

### Part 2

Notes:
- `PrebuildFailureLimitReached` notification is added.
- Notification is sent to template admins.
- Notification is sent only the first time, when hard limit is reached.
But it will `log.Warn` on every loop iteration.
- I introduced this enum:
```sql
CREATE TYPE prebuild_status AS ENUM (
  'normal',           -- Prebuilds are working as expected; this is the default, healthy state.
  'hard_limited',     -- Prebuilds have failed repeatedly and hit the configured hard failure limit; won't be retried anymore.
  'validation_failed' -- Prebuilds failed due to a non-retryable validation error (e.g. template misconfiguration); won't be retried.
);
```
`validation_failed` not used in this PR, but I think it will be used in
next one, so I wanted to save us an extra migration.

- Notification looks like this:
<img width="472" alt="image"
src="https://github.com/user-attachments/assets/e10efea0-1790-4e7f-a65c-f94c40fced27"
/>

### Latest notification views:
<img width="463" alt="image"
src="https://github.com/user-attachments/assets/11310c58-68d1-4075-a497-f76d854633fe"
/>
<img width="725" alt="image"
src="https://github.com/user-attachments/assets/6bbfe21a-91ac-47c3-a9d1-21807bb0c53a"
/>
2025-05-21 15:16:38 -04:00
b7462fb256 feat: improve transaction safety in CompleteJob function (#17970)
This PR refactors the CompleteJob function to use database transactions
more consistently for better atomicity guarantees. The large function
was broken down into three specialized handlers:

- completeTemplateImportJob
- completeWorkspaceBuildJob
- completeTemplateDryRunJob

Each handler now uses the Database.InTx wrapper to ensure all database
operations for a job completion are performed within a single
transaction, preventing partial updates in case of failures.

Added comprehensive tests for transaction behavior for each job type.

Fixes #17694

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-authored-by: Claude <noreply@anthropic.com>
2025-05-21 16:48:51 +02:00
3e7ff9d9e1 chore(coderd/rbac): add Action{Create,Delete}Agent to ResourceWorkspace (#17932) 2025-05-20 21:20:56 +01:00
a123900fe8 chore: remove coder/preview dependency from codersdk (#17939) 2025-05-20 10:45:12 -05:00
e76d58f2b6 chore: disable parameter validatation for dynamic params for all transitions (#17926)
Dynamic params skip parameter validation in coder/coder.
This is because conditional parameters cannot be validated 
with the static parameters in the database.
2025-05-20 10:09:53 -05:00
93f17bc73e fix: remove unnecessary user lookup in agent API calls (#17934)
# Use workspace.OwnerUsername instead of fetching the owner

This PR optimizes the agent API by using the `workspace.OwnerUsername` field directly instead of making an additional database query to fetch the owner's username. The change removes the need to call `GetUserByID` in the manifest API and workspace agent RPC endpoints.

An issue arose when the agent token was scoped without access to user data (`api_key_scope = "no_user_data"`), causing the agent to fail to fetch the manifest due to an RBAC issue.

Change-Id: I3b6e7581134e2374b364ee059e3b18ece3d98b41
Signed-off-by: Thomas Kosiewski <tk@coder.com>
2025-05-20 17:07:50 +02:00
1267c9c405 fix: ensure reason present for workspace autoupdated notification (#17935)
Fixes https://github.com/coder/coder/issues/17930

Update the `WorkspaceAutoUpdated` notification to only display the
reason if it is present.
2025-05-20 16:01:57 +01:00
769c9ee337 feat: cancel stuck pending jobs (#17803)
Closes: #16488
2025-05-20 15:22:44 +02:00
9c000468a1 chore: expose use_classic_parameter_flow on workspace response (#17925) 2025-05-19 21:59:15 +00:00
358b64154e chore: skip parameter resolution for dynamic params (#17922)
Pass through the user input as is. The previous code only passed through
parameters that existed in the db (static params). This would omit
conditional params.

Validation is enforced by the dynamic params websocket, so validation at
this point is not required.
2025-05-19 16:15:15 -05:00
61f22a59ba feat(agent): add ParentId to agent manifest (#17888)
Closes https://github.com/coder/internal/issues/648

This change introduces a new `ParentId` field to the agent's manifest.
This will allow an agent to know if it is a child or not, as well as
knowing who the owner is.

This is part of the Dev Container Agents work
2025-05-19 16:09:56 +01:00
f044cc3550 feat: add provisioner daemon name to provisioner jobs responses (#17877)
# Description

This PR adds the `worker_name` field to the provisioner jobs endpoint.

To achieve this, the following SQL query was updated:
-
`GetProvisionerJobsByOrganizationAndStatusWithQueuePositionAndProvisioner`

As a result, the `codersdk.ProvisionerJob` type, which represents the
provisioner job API response, was modified to include the new field.

**Notes:** 
* As mentioned in
[comment](https://github.com/coder/coder/pull/17877#discussion_r2093218206),
the `GetProvisionerJobsByIDsWithQueuePosition` query was not changed due
to load concerns. This means that for template and template version
endpoints, `worker_id` will still be returned, but `worker_name` will
not.
* Similar to `worker_id`, the `worker_name` is only present once a job
is assigned to a provisioner daemon. For jobs in a pending state (not
yet assigned), neither `worker_id` nor `worker_name` will be returned.

---

# Affected Endpoints

- `/organizations/{organization}/provisionerjobs`
- `/organizations/{organization}/provisionerjobs/{job}`

---

# Testing

- Added new tests verifying that both `worker_id` and `worker_name` are
returned once a provisioner job reaches the **succeeded** state.
- Existing tests covering state transitions and other logic remain
unchanged, as they test different scenarios.

---

# Front-end Changes

Admin provisioner jobs dashboard:
<img width="1088" alt="Screenshot 2025-05-16 at 11 51 33"
src="https://github.com/user-attachments/assets/0e20e360-c615-4497-84b7-693777c5443e"
/>

Fixes: https://github.com/coder/coder/issues/16982
2025-05-19 16:05:39 +01:00
98e2ec4417 feat: show devcontainer dirty status and allow recreate (#17880)
Updates #16424
2025-05-19 12:56:10 +03:00
c775ea8411 test: fix a race in TestReinit (#17902)
closes https://github.com/coder/internal/issues/632

`pubsubReinitSpy` used to signal that a subscription had happened before
it actually had.
This created a slight opportunity for the main goroutine to publish
before the actual subscription was listening. The published event was
then dropped, leading to a failed test.
2025-05-19 11:37:54 +02:00
1a41608035 fix: stop extending API key access if OIDC refresh is available (#17878)
fixes #17070

Cleans up our handling of APIKey expiration and OIDC to keep them separate concepts. For an OIDC-login APIKey, both the APIKey and OIDC link must be valid to login. If the OIDC link is expired and we have a refresh token, we will attempt to refresh.

OIDC refreshes do not have any effect on APIKey expiry.

https://github.com/coder/coder/issues/17070#issuecomment-2886183613 explains why this is the correct behavior.
2025-05-19 12:05:35 +04:00
f36fb67f57 chore: use static params when dynamic param metadata is missing (#17836)
Existing template versions do not have the metadata (modules + plan) in
the db. So revert to using static parameter information from the
original template import.

This data will still be served over the websocket.
2025-05-16 11:47:59 -05:00
c2bc801f83 chore: add 'classic_parameter_flow' column setting to templates (#17828)
We are forcing users to try the dynamic parameter experience first.
Currently this setting only comes into effect if an experiment is
enabled.
2025-05-15 17:55:17 -05:00
1bacd82e80 feat: add API key scope to restrict access to user data (#17692) 2025-05-15 15:32:52 +01:00
2aa8cbebd7 fix: exclude deleted templates from metrics collection (#17839)
Also add some clarification about the lack of database constraints for
soft template deletion.

---------

Signed-off-by: Danny Kopping <dannykopping@gmail.com>
Co-authored-by: Danny Kopping <dannykopping@gmail.com>
2025-05-15 13:33:58 +02:00
f2edcf3f59 fix: add missing clause for tracking replacements (#17849)
We should only be tracking resource replacements during a prebuild
claim.

Signed-off-by: Danny Kopping <dannykopping@gmail.com>
2025-05-15 13:02:30 +02:00
789c4beba7 chore: add dynamic parameter error if missing metadata from provisioner (#17809) 2025-05-14 12:21:36 -05:00
f3bcac2e90 refactor: improve overlayFS errors (#17808) 2025-05-14 10:26:47 -06:00
6e967780c9 feat: track resource replacements when claiming a prebuilt workspace (#17571)
Closes https://github.com/coder/internal/issues/369

We can't know whether a replacement (i.e. drift of terraform state
leading to a resource needing to be deleted/recreated) will take place
apriori; we can only detect it at `plan` time, because the provider
decides whether a resource must be replaced and it cannot be inferred
through static analysis of the template.

**This is likely to be the most common gotcha with using prebuilds,
since it requires a slight template modification to use prebuilds
effectively**, so let's head this off before it's an issue for
customers.

Drift details will now be logged in the workspace build logs:


![image](https://github.com/user-attachments/assets/da1988b6-2cbe-4a79-a3c5-ea29891f3d6f)

Plus a notification will be sent to template admins when this situation
arises:


![image](https://github.com/user-attachments/assets/39d555b1-a262-4a3e-b529-03b9f23bf66a)

A new metric - `coderd_prebuilt_workspaces_resource_replacements_total`
- will also increment each time a workspace encounters replacements.

We only track _that_ a resource replacement occurred, not how many. Just
one is enough to ruin a prebuild, but we can't know apriori which
replacement would cause this.
For example, say we have 2 replacements: a `docker_container` and a
`null_resource`; we don't know which one might
cause an issue (or indeed if either would), so we just track the
replacement.

---------

Signed-off-by: Danny Kopping <dannykopping@gmail.com>
2025-05-14 14:52:22 +02:00
425ee6fa55 feat: reinitialize agents when a prebuilt workspace is claimed (#17475)
This pull request allows coder workspace agents to be reinitialized when
a prebuilt workspace is claimed by a user. This facilitates the transfer
of ownership between the anonymous prebuilds system user and the new
owner of the workspace.

Only a single agent per prebuilt workspace is supported for now, but
plumbing has already been done to facilitate the seamless transition to
multi-agent support.

---------

Signed-off-by: Danny Kopping <dannykopping@gmail.com>
Co-authored-by: Danny Kopping <dannykopping@gmail.com>
2025-05-14 14:15:36 +02:00
60762d4c13 feat: load terraform modules when using dynamic parameters (#17714) 2025-05-13 16:07:29 -05:00
ef745c0c5d chore: optimize workspace_latest_builds view query (#17789)
Avoids two sequential scans of massive tables (`workspace_builds`,
`provisioner_jobs`) and uses index scans instead. This new view largely
replicates our already optimized query `GetWorkspaces` to fetch the
latest build.

The original query and the new query were compared against the dogfood
database to ensure they return the exact same data in the exact same
order (minus the new `workspaces.deleted = false` filter to improve
performance even more). The performance is massively improved even
without the `workspaces.deleted = false` filter, but it was added to
improve it even more.

Note: these query times are probably inflated due to high database load
on our dogfood environment that this intends to partially resolve.

Before: 2,139ms
([explain](https://explain.dalibo.com/plan/997e4fch241b46e6))

After: 33ms
([explain](https://explain.dalibo.com/plan/c888dc223870f181))

Co-authored-by: Cian Johnston <cian@coder.com>

---------

Signed-off-by: Danny Kopping <dannykopping@gmail.com>
Co-authored-by: Mathias Fredriksson <mafredri@gmail.com>
Co-authored-by: Danny Kopping <dannykopping@gmail.com>
2025-05-13 20:51:01 +02:00
64807e1d61 chore: apply the 4mb max limit on drpc protocol message size (#17771)
Respect the 4mb max limit on proto messages
2025-05-13 11:24:51 -05:00
b0788f410f chore: rename "Test Notification" to "Troubleshooting Notification" (#17790)
Rename the "Test Notification" to "Troubleshooting Notification"
2025-05-13 13:52:55 +01:00
599bb35a04 fix(coderd): list templates returns non-deprecated templates by default (#17747)
## Description

Modifies the behaviour of the "list templates" API endpoints to return
non-deprecated templates by default. Users can still query for
deprecated templates by specifying the `deprecated=true` query
parameter.

**Note:** The deprecation feature is an enterprise-level feature

## Affected Endpoints
* /api/v2/organizations/{organization}/templates
* /api/v2/templates

Fixes #17565
2025-05-13 12:44:46 +01:00