Commit Graph

591 Commits

Author SHA1 Message Date
7b152cdd91 chore: increase fileCache hit rate in autobuilds lifecycle (#18507)
`wsbuilder` hits the file cache when running validation. This solution is imperfect, but by first sorting workspaces by their template version id, the cache hit rate should improve.
2025-06-24 07:36:39 -05:00
670fa4a3cc feat: add the /aitasks/prompts endpoint (#18464)
Add an endpoint to fetch AI task prompts for multiple workspace builds
at the same time. A prompt is the value of the "AI Prompt" workspace
build parameter. On main, the only way our API allows fetching workspace
build parameters is by using the `/workspacebuilds/$build_id/parameters`
endpoint, requiring a separate API call for every build.

The Tasks dashboard fetches Task workspaces in order to show them in a
list, and then needs to fetch the value of the `AI Prompt` parameter for
every task workspace (using its latest build id), requiring an
additional API call for each list item. This endpoint will allow the
dashboard to make just 2 calls to render the list: one to fetch task
workspaces, the other to fetch prompts.

<img width="1512" alt="Screenshot 2025-06-20 at 11 33 11"
src="https://github.com/user-attachments/assets/92899999-e922-44c5-8325-b4b23a0d2bff"
/>

Related to https://github.com/coder/internal/issues/660.
2025-06-24 13:06:02 +02:00
0238f2926d feat: persist AI task state in template imports & workspace builds (#18449) 2025-06-24 10:36:37 +00:00
6cc4cfa346 feat: allow for default presets (#18445) 2025-06-24 12:19:19 +02:00
4699393522 fix: upsert coder_app resources in case they are persistent (#18509) 2025-06-23 18:50:44 +00:00
0f6ca55238 feat: implement scheduling mechanism for prebuilds (#18126)
Closes https://github.com/coder/internal/issues/312
Depends on https://github.com/coder/terraform-provider-coder/pull/408

This PR adds support for defining an **autoscaling block** for
prebuilds, allowing number of desired instances to scale dynamically
based on a schedule.

Example usage:
```
data "coder_workspace_preset" "us-nix" {
  ...
  
  prebuilds = {
    instances = 0                  # default to 0 instances
    
    scheduling = {
      timezone = "UTC"             # a single timezone is used for simplicity
      
      # Scale to 3 instances during the work week
      schedule {
        cron = "* 8-18 * * 1-5"    # from 8AM–6:59PM, Mon–Fri, UTC
        instances = 3              # scale to 3 instances
      }
      
      # Scale to 1 instance on Saturdays for urgent support queries
      schedule {
        cron = "* 8-14 * * 6"      # from 8AM–2:59PM, Sat, UTC
        instances = 1              # scale to 1 instance
      }
    }
  }
}
```

### Behavior
- Multiple `schedule` blocks per `prebuilds` block are supported.
- If the current time matches any defined autoscaling schedule, the
corresponding number of instances is used.
- If no schedule matches, the **default instance count**
(`prebuilds.instances`) is used as a fallback.

### Why
This feature allows prebuild instance capacity to adapt to predictable
usage patterns, such as:
- Scaling up during business hours or high-demand periods
- Reducing capacity during off-hours to save resources

### Cron specification
The cron specification is interpreted as a **continuous time range.**

For example, the expression:

```
* 9-18 * * 1-5
```

is intended to represent a continuous range from **09:00 to 18:59**,
Monday through Friday.

However, due to minor implementation imprecision, it is currently
interpreted as a range from **08:59:00 to 18:58:59**, Monday through
Friday.

This slight discrepancy arises because the evaluation is based on
whether a specific **point in time** falls within the range, using the
`github.com/coder/coder/v2/coderd/schedule/cron` library, which performs
per-minute matching rather than strict range evaluation.

---------

Co-authored-by: Danny Kopping <danny@coder.com>
2025-06-19 11:08:48 -04:00
511fd09582 fix(coderd): mark sub agent deletion via boolean instead of delete (#18411)
Deletion of data is uncommon in our database, so the introduction of sub agents
and the deletion of them introduced issues with foreign key assumptions, as can
be seen in coder/internal#685. We could have only addressed the specific case by
allowing cascade deletion of stats as well as handling in the stats collector,
but it's unclear how many more such edge-cases we could run into.

In this change, we mark the rows as deleted via boolean instead, and filter them
out in all relevant queries.

Fixes coder/internal#685
2025-06-19 13:32:51 +00:00
8f6a5afa4f feat: add backend logic for determining tasks tab visibility (#18401)
This PR implements the backend logic for determining if the Tasks tab
should be visible in the web UI as described in [the
RFC](https://www.notion.so/coderhq/Coder-Tasks-207d579be5928053ab68c8d9a4b59eaa?source=copy_link#210d579be5928013ab5acbe69a2f548b).

The frontend component will be added in a follow-up PR once the entire
Tasks backend is implemented so as not to break the dogfood environment
until then.
2025-06-18 18:32:34 +02:00
591f5db5f6 feat: add has-ai-task filters to the /workspaces and /templates endpoints (#18387)
This PR allows filtering templates and workspaces with the `has-ai-task`
filter as described in the [Coder Tasks
RFC](https://www.notion.so/coderhq/Coder-Tasks-207d579be5928053ab68c8d9a4b59eaa?source=copy_link#20ad579be59280e6a000eb0646d3c2df).
2025-06-18 18:22:45 +02:00
ebc769f328 chore: make has_ai_task fields on workspace builds and template versions nullable (#18403)
The fields must be nullable because there’s a period of time between
inserting a row into the database and finishing the “plan” provisioner
job when the final value of the field is unknown.
2025-06-17 16:08:34 +02:00
fa86cc4adf chore: support the has_ai_task column in template version and workspace insert queries (#18385)
https://github.com/coder/coder/pull/18359 added the `has_ai_task`
columns on the `workspace_builds` and `template_versions` tables.
2025-06-16 16:07:16 +02:00
8e29ee50a3 feat: add ai tasks migrations (#18359)
Adds database migrations required for the Tasks feature.

There's a slight difference between the migrations in this PR and the
RFC: this PR adds `NOT NULL` constraints to the `has_ai_task` columns.
It was an oversight on my part when I wrote the RFC - I assumed the
`DEFAULT FALSE` value would make the columns implicitly NOT NULL, but
that's not the case with Postgres. We have no use for the NULL value.

The `DEFAULT FALSE` statement ensures that the migration will pass even
when there are existing rows in the template version and workspace
builds tables, so there's no danger in adding the `NOT NULL`
constraints.
2025-06-13 15:54:02 +02:00
8387dd27ab chore: add form_type parameter argument to db (#17920)
`form_type` is a new parameter field in the terraform provider. Bring
that field into coder/coder.

Validation for `multi-select` has also been added.
2025-05-29 08:55:19 -05:00
b712d0b23f feat(coderd/agentapi): implement sub agent api (#17823)
Closes https://github.com/coder/internal/issues/619

Implement the `coderd` side of the AgentAPI for the upcoming
dev-container agents work.

`agent/agenttest/client.go` is left unimplemented for a future PR
working to implement the agent side of this feature.
2025-05-29 12:15:47 +01:00
110102a60a fix: optimize queue position sql query (#17974)
Use only `online provisioner daemons` for
`GetProvisionerJobsByIDsWithQueuePosition` query. It should improve
performance of the query.
2025-05-28 08:21:16 -04:00
9fc3329575 feat: persist app groups in the database (#17977) 2025-05-27 13:13:08 -06:00
d63417b542 fix: update WorkspaceOwnerName to use user.name instead of user.username (#18025)
We have been using the user.username instead of user.name in wrong
places, making it very confusing for the UI.
2025-05-27 11:42:07 -03:00
6f6e73af03 feat: implement expiration policy logic for prebuilds (#17996)
## Summary 

This PR introduces support for expiration policies in prebuilds. The TTL
(time-to-live) is retrieved from the Terraform configuration
([terraform-provider-coder
PR](https://github.com/coder/terraform-provider-coder/pull/404)):
```
prebuilds = {
	  instances = 2
	  expiration_policy {
		  ttl = 86400
	  }
  }
```
**Note**: Since there is no need for precise TTL enforcement down to the
second, in this implementation expired prebuilds are handled in a single
reconciliation cycle: they are deleted, and new instances are created
only if needed to match the desired count.

## Changes

* The outcome of a reconciliation cycle is now expressed as a slice of
reconciliation actions, instead of a single aggregated action.
* Adjusted reconciliation logic to delete expired prebuilds and
guarantee that the number of desired instances is correct.
* Updated relevant data structures and methods to support expiration
policies parameters.
* Added documentation to `Prebuilt workspaces` page
* Update `terraform-provider-coder` to version 2.5.0:
https://github.com/coder/terraform-provider-coder/releases/tag/v2.5.0

Depends on: https://github.com/coder/terraform-provider-coder/pull/404
Fixes: https://github.com/coder/coder/issues/17916
2025-05-26 20:31:24 +01:00
53e8e9c7cd fix: reduce cost of prebuild failure (#17697)
Relates to https://github.com/coder/coder/issues/17432

### Part 1:

Notes:
- `GetPresetsAtFailureLimit` SQL query is added, which is similar to
`GetPresetsBackoff`, they use same CTEs: `filtered_builds`,
`time_sorted_builds`, but they are still different.

- Query is executed on every loop iteration. We can consider marking
specific preset as permanently failed as an optimization to avoid
executing query on every loop iteration. But I decided don't do it for
now.

- By default `FailureHardLimit` is set to 3.

- `FailureHardLimit` is configurable. Setting it to zero - means that
hard limit is disabled.

### Part 2

Notes:
- `PrebuildFailureLimitReached` notification is added.
- Notification is sent to template admins.
- Notification is sent only the first time, when hard limit is reached.
But it will `log.Warn` on every loop iteration.
- I introduced this enum:
```sql
CREATE TYPE prebuild_status AS ENUM (
  'normal',           -- Prebuilds are working as expected; this is the default, healthy state.
  'hard_limited',     -- Prebuilds have failed repeatedly and hit the configured hard failure limit; won't be retried anymore.
  'validation_failed' -- Prebuilds failed due to a non-retryable validation error (e.g. template misconfiguration); won't be retried.
);
```
`validation_failed` not used in this PR, but I think it will be used in
next one, so I wanted to save us an extra migration.

- Notification looks like this:
<img width="472" alt="image"
src="https://github.com/user-attachments/assets/e10efea0-1790-4e7f-a65c-f94c40fced27"
/>

### Latest notification views:
<img width="463" alt="image"
src="https://github.com/user-attachments/assets/11310c58-68d1-4075-a497-f76d854633fe"
/>
<img width="725" alt="image"
src="https://github.com/user-attachments/assets/6bbfe21a-91ac-47c3-a9d1-21807bb0c53a"
/>
2025-05-21 15:16:38 -04:00
3e7ff9d9e1 chore(coderd/rbac): add Action{Create,Delete}Agent to ResourceWorkspace (#17932) 2025-05-20 21:20:56 +01:00
769c9ee337 feat: cancel stuck pending jobs (#17803)
Closes: #16488
2025-05-20 15:22:44 +02:00
f044cc3550 feat: add provisioner daemon name to provisioner jobs responses (#17877)
# Description

This PR adds the `worker_name` field to the provisioner jobs endpoint.

To achieve this, the following SQL query was updated:
-
`GetProvisionerJobsByOrganizationAndStatusWithQueuePositionAndProvisioner`

As a result, the `codersdk.ProvisionerJob` type, which represents the
provisioner job API response, was modified to include the new field.

**Notes:** 
* As mentioned in
[comment](https://github.com/coder/coder/pull/17877#discussion_r2093218206),
the `GetProvisionerJobsByIDsWithQueuePosition` query was not changed due
to load concerns. This means that for template and template version
endpoints, `worker_id` will still be returned, but `worker_name` will
not.
* Similar to `worker_id`, the `worker_name` is only present once a job
is assigned to a provisioner daemon. For jobs in a pending state (not
yet assigned), neither `worker_id` nor `worker_name` will be returned.

---

# Affected Endpoints

- `/organizations/{organization}/provisionerjobs`
- `/organizations/{organization}/provisionerjobs/{job}`

---

# Testing

- Added new tests verifying that both `worker_id` and `worker_name` are
returned once a provisioner job reaches the **succeeded** state.
- Existing tests covering state transitions and other logic remain
unchanged, as they test different scenarios.

---

# Front-end Changes

Admin provisioner jobs dashboard:
<img width="1088" alt="Screenshot 2025-05-16 at 11 51 33"
src="https://github.com/user-attachments/assets/0e20e360-c615-4497-84b7-693777c5443e"
/>

Fixes: https://github.com/coder/coder/issues/16982
2025-05-19 16:05:39 +01:00
c2bc801f83 chore: add 'classic_parameter_flow' column setting to templates (#17828)
We are forcing users to try the dynamic parameter experience first.
Currently this setting only comes into effect if an experiment is
enabled.
2025-05-15 17:55:17 -05:00
1bacd82e80 feat: add API key scope to restrict access to user data (#17692) 2025-05-15 15:32:52 +01:00
2aa8cbebd7 fix: exclude deleted templates from metrics collection (#17839)
Also add some clarification about the lack of database constraints for
soft template deletion.

---------

Signed-off-by: Danny Kopping <dannykopping@gmail.com>
Co-authored-by: Danny Kopping <dannykopping@gmail.com>
2025-05-15 13:33:58 +02:00
789c4beba7 chore: add dynamic parameter error if missing metadata from provisioner (#17809) 2025-05-14 12:21:36 -05:00
425ee6fa55 feat: reinitialize agents when a prebuilt workspace is claimed (#17475)
This pull request allows coder workspace agents to be reinitialized when
a prebuilt workspace is claimed by a user. This facilitates the transfer
of ownership between the anonymous prebuilds system user and the new
owner of the workspace.

Only a single agent per prebuilt workspace is supported for now, but
plumbing has already been done to facilitate the seamless transition to
multi-agent support.

---------

Signed-off-by: Danny Kopping <dannykopping@gmail.com>
Co-authored-by: Danny Kopping <dannykopping@gmail.com>
2025-05-14 14:15:36 +02:00
0b5f27f566 feat: add parent_id column to workspace_agents table (#17758)
Adds a new nullable column `parent_id` to `workspace_agents` table. This
lays the groundwork for having child agents.
2025-05-13 00:01:31 +01:00
d0ab91c16f fix: reduce size of terraform modules archive (#17749) 2025-05-12 13:50:07 -06:00
a9f1a6b2a2 fix: revert fix: persist terraform modules during template import (#17665) (#17734)
This reverts commit ae3d90b057.
2025-05-08 22:03:08 -04:00
ae3d90b057 fix: persist terraform modules during template import (#17665) 2025-05-08 16:13:46 -06:00
544259b809 feat: add database tables and API routes for agentic chat feature (#17570)
Backend portion of experimental `AgenticChat` feature:
- Adds database tables for chats and chat messages
- Adds functionality to stream messages from LLM providers using
`kylecarbs/aisdk-go`
- Adds API routes with relevant functionality (list, create, update
chats, insert chat message)
- Adds experiment `codersdk.AgenticChat`

---------

Co-authored-by: Kyle Carberry <kyle@carberry.com>
2025-05-02 17:29:57 +01:00
b7e08ba7c9 fix: filter out deleted users when attempting to delete an organization (#17621)
Closes
[coder/internal#601](https://github.com/coder/internal/issues/601)
2025-05-01 13:26:01 -03:00
27bc60d1b9 feat: implement reconciliation loop (#17261)
Closes https://github.com/coder/internal/issues/510

<details>
<summary> Refactoring Summary </summary>

### 1) `CalculateActions` Function

#### Issues Before Refactoring:

- Large function (~150 lines), making it difficult to read and maintain.
- The control flow is hard to follow due to complex conditional logic.
- The `ReconciliationActions` struct was partially initialized early,
then mutated in multiple places, making the flow error-prone.

Original source:  

fe60b569ad/coderd/prebuilds/state.go (L13-L167)

#### Improvements After Refactoring:

- Simplified and broken down into smaller, focused helper methods.
- The flow of the function is now more linear and easier to understand.
- Struct initialization is cleaner, avoiding partial and incremental
mutations.

Refactored function:  

eeb0407d78/coderd/prebuilds/state.go (L67-L84)

---

### 2) `ReconciliationActions` Struct

#### Issues Before Refactoring:

- The struct mixed both actionable decisions and diagnostic state, which
blurred its purpose.
- It was unclear which fields were necessary for reconciliation logic,
and which were purely for logging/observability.

#### Improvements After Refactoring:

- Split into two clear, purpose-specific structs:
- **`ReconciliationActions`** — defines the intended reconciliation
action.
- **`ReconciliationState`** — captures runtime state and metadata,
primarily for logging and diagnostics.

Original struct:  

fe60b569ad/coderd/prebuilds/reconcile.go (L29-L41)

</details>

---------

Signed-off-by: Danny Kopping <dannykopping@gmail.com>
Co-authored-by: Sas Swart <sas.swart.cdk@gmail.com>
Co-authored-by: Danny Kopping <dannykopping@gmail.com>
Co-authored-by: Dean Sheather <dean@deansheather.com>
Co-authored-by: Spike Curtis <spike@coder.com>
Co-authored-by: Danny Kopping <danny@coder.com>
2025-04-17 09:29:29 -04:00
669e790df6 test: add unit test to excercise bug when idp sync hits deleted orgs (#17405)
Deleted organizations are still attempting to sync members. This causes
an error on inserting the member, and would likely cause issues later in
the sync process even if that member is inserted. Deleted orgs should be
skipped.
2025-04-16 09:27:35 -05:00
06d39151dc feat: extend request logs with auth & DB info (#17304)
Closes #16903
2025-04-15 13:27:23 +02:00
979687c37f chore(codersdk): deprecate WorkspaceAppStatus.{NeedsUserAttention,Icon} (#17358)
https://github.com/coder/coder/pull/17163 introduced the
`workspace_app_statuses` table. Two of these fields
(`needs_user_attention`, `icon`) turned out to be surplus to
requirements.

- Removes columns `needs_user_attention` and `icon` from
`workspace_app_statuses`
- Marks the corresponding fields of `codersdk.WorkspaceAppStatus` as
deprecated.
2025-04-15 10:47:42 +01:00
c06ef7c1eb chore!: remove JFrog integration (#17353)
- Removes displaying XRay scan results in the dashboard. I'm not sure
  anyone was even using this integration so it's just debt for us to
  maintain. We can open up a separate issue to get rid of the db tables
  once we know for sure that we haven't broken anyone.
2025-04-11 14:45:21 -04:00
6dd1056025 feat(coderd/notifications): group workspace build failure report (#17306)
Closes https://github.com/coder/coder/issues/15745

Instead of sending X many reports to a single template admin, we instead
send only 1.
2025-04-10 13:32:19 +01:00
0e658219b2 feat: support filtering users table by login type (#17238)
#15896 Mentions ability to add support for filtering by login type

The issue mentions that backend API support exists but the backend did
not seem to have the support for this filter. So I have added the
ability to filter it.

I also added a corresponding update to readme file to make sure the docs
will correctly showcase this feature
2025-04-09 13:59:41 +10:00
743d308eb3 feat: support multiple terminal fonts (#17257)
Fixes: https://github.com/coder/coder/issues/15024
2025-04-07 14:30:10 +02:00
0b2b643ce2 feat: persist prebuild definitions on template import (#16951)
This PR allows provisioners to recognise and report prebuild definitions
to the coder control plane. It also allows the coder control plane to
then persist these to its store.

closes https://github.com/coder/internal/issues/507

---------

Signed-off-by: Danny Kopping <dannykopping@gmail.com>
Co-authored-by: Danny Kopping <dannykopping@gmail.com>
Co-authored-by: evgeniy-scherbina <evgeniy.shcherbina.es@gmail.com>
2025-04-07 10:35:28 +02:00
99c6f235eb feat: add migrations and queries to support prebuilds (#16891)
Depends on https://github.com/coder/coder/pull/16916 _(change base to
`main` once it is merged)_

Closes https://github.com/coder/internal/issues/514

_This is one of several PRs to decompose the `dk/prebuilds` feature
branch into separate PRs to merge into `main`._

---------

Signed-off-by: Danny Kopping <dannykopping@gmail.com>
Co-authored-by: Danny Kopping <dannykopping@gmail.com>
Co-authored-by: evgeniy-scherbina <evgeniy.shcherbina.es@gmail.com>
2025-04-03 10:58:30 +02:00
184c1f0a59 chore: add db queries for dynamic parameters (#17137) 2025-04-01 15:47:30 -06:00
8ea956fc11 feat: add app status tracking to the backend (#17163)
This does ~95% of the backend work required to integrate the AI work.

Most left to integrate from the tasks branch is just frontend, which
will be a lot smaller I believe.

The real difference between this branch and that one is the abstraction
-- this now attaches statuses to apps, and returns the latest status
reported as part of a workspace.

This change enables us to have a similar UX to in the tasks branch, but
for agents other than Claude Code as well. Any app can report status
now.
2025-03-31 10:55:44 -04:00
06e5d9ef21 feat(coderd): add webpush package (#17091)
* Adds `codersdk.ExperimentWebPush` (`web-push`)
* Adds a `coderd/webpush` package that allows sending native push
notifications via `github.com/SherClockHolmes/webpush-go`
* Adds database tables to store push notification subscriptions.
* Adds an API endpoint that allows users to subscribe/unsubscribe, and
send a test notification (404 without experiment, excluded from API docs)
* Adds server CLI command to regenerate VAPID keys (note: regenerating
the VAPID keypair requires deleting all existing subscriptions)

---------

Co-authored-by: Kyle Carberry <kyle@carberry.com>
2025-03-27 10:03:53 +00:00
cf10d98aab fix: improve error message when deleting organization with resources (#17049)
Closes
[coder/internal#477](https://github.com/coder/internal/issues/477)

![Screenshot 2025-03-21 at 11 25
57 AM](https://github.com/user-attachments/assets/50cc03e9-395d-4fc7-8882-18cb66b1fac9)

I'm solving this issue in two parts:

1. Updated the postgres function so that it doesn't omit 0 values in the
error
2. Created a new query to fetch the number of resources associated with
an organization and using that information to provider a cleaner error
message to the frontend

> **_NOTE:_** SQL is not my strong suit, and the code was created with
the help of AI. So I'd take extra time looking over what I wrote there
2025-03-25 15:31:24 -04:00
5c8cac9fb7 feat: add name to workspace agent devcontainers (#17089)
In the presence of multiple devcontainers, it would be nice to
differentiate them by name. This change inherits the resource name from
terraform.

Refs #17076
2025-03-25 12:59:20 +00:00
4c33846f6d chore: add prebuilds system user (#16916)
Pre-requisite for https://github.com/coder/coder/pull/16891

Closes https://github.com/coder/internal/issues/515

This PR introduces a new concept of a "system" user.

Our data model requires that all workspaces have an owner (a `users`
relation), and prebuilds is a feature that will spin up workspaces to be
claimed later by actual users - and thus needs to own the workspaces in
the interim.

Naturally, introducing a change like this touches a few aspects around
the codebase and we've taken the approach _default hidden_ here; in
other words, queries for users will by default _exclude_ all system
users, but there is a flag to ensure they can be displayed. This keeps
the changeset relatively small.

This user has minimal permissions (it's equivalent to a `member` since
it has no roles). It will be associated with the default org in the
initial migration, and thereafter we'll need to somehow ensure its
membership aligns with templates (which are org-scoped) for which it'll
need to provision prebuilds; that's a solution we'll have in a
subsequent PR.

---------

Signed-off-by: Danny Kopping <dannykopping@gmail.com>
Co-authored-by: Sas Swart <sas.swart.cdk@gmail.com>
2025-03-25 12:18:06 +00:00
5b3eda6719 chore: persist template import terraform plan in postgres (#17012) 2025-03-24 10:01:50 -06:00