Commit Graph

7 Commits

Author SHA1 Message Date
6e967780c9 feat: track resource replacements when claiming a prebuilt workspace (#17571)
Closes https://github.com/coder/internal/issues/369

We can't know whether a replacement (i.e. drift of terraform state
leading to a resource needing to be deleted/recreated) will take place
apriori; we can only detect it at `plan` time, because the provider
decides whether a resource must be replaced and it cannot be inferred
through static analysis of the template.

**This is likely to be the most common gotcha with using prebuilds,
since it requires a slight template modification to use prebuilds
effectively**, so let's head this off before it's an issue for
customers.

Drift details will now be logged in the workspace build logs:


![image](https://github.com/user-attachments/assets/da1988b6-2cbe-4a79-a3c5-ea29891f3d6f)

Plus a notification will be sent to template admins when this situation
arises:


![image](https://github.com/user-attachments/assets/39d555b1-a262-4a3e-b529-03b9f23bf66a)

A new metric - `coderd_prebuilt_workspaces_resource_replacements_total`
- will also increment each time a workspace encounters replacements.

We only track _that_ a resource replacement occurred, not how many. Just
one is enough to ruin a prebuild, but we can't know apriori which
replacement would cause this.
For example, say we have 2 replacements: a `docker_container` and a
`null_resource`; we don't know which one might
cause an issue (or indeed if either would), so we just track the
replacement.

---------

Signed-off-by: Danny Kopping <dannykopping@gmail.com>
2025-05-14 14:52:22 +02:00
a646478aed fix: move pubsub publishing out of database transactions to avoid conn exhaustion (#17648)
Database transactions hold onto connections, and `pubsub.Publish` tries
to acquire a connection of its own. If the latter is called within a
transaction, this can lead to connection exhaustion.

I plan two follow-ups to this PR:

1. Make connection counts tuneable

https://github.com/coder/coder/blob/main/cli/server.go#L2360-L2376

We will then be able to write tests showing how connection exhaustion
occurs.

2. Write a linter/ruleguard to prevent `pubsub.Publish` from being
called within a transaction.

---------

Signed-off-by: Danny Kopping <dannykopping@gmail.com>
2025-05-05 11:54:18 +02:00
ef11d4f769 fix: fix bug with deletion of prebuilt workspaces (#17652)
Don't specify the template version for a delete transition, because the
prebuilt workspace may have been created using an older template
version.
If the template version isn't explicitly set, the builder will
automatically use the version from the last workspace build - which is
the desired behavior.
2025-05-01 17:26:30 -04:00
98e5611e16 fix: fix for prebuilds claiming and deletion (#17624)
PR contains:
- fix for claiming & deleting prebuilds with immutable params
- unit test for claiming scenario
- unit test for deletion scenario

The parameter resolver was failing when deleting/claiming prebuilds
because a value for a previously-used parameter was provided to the
resolver, but since the value was unchanged (it's coming from the
preset) it failed in the resolver. The resolver was missing a check to
see if the old value != new value; if the values match then there's no
mutation of an immutable parameter.

---------

Signed-off-by: Danny Kopping <dannykopping@gmail.com>
2025-05-01 08:52:23 +00:00
e0483e3136 feat: add prebuilds metrics collector (#17547)
Closes https://github.com/coder/internal/issues/509

---------

Signed-off-by: Danny Kopping <dannykopping@gmail.com>
2025-04-28 12:28:56 +02:00
08ad910171 feat: add prebuilds configuration & bootstrapping (#17527)
Closes https://github.com/coder/internal/issues/508

---------

Signed-off-by: Danny Kopping <dannykopping@gmail.com>
Co-authored-by: Cian Johnston <cian@coder.com>
2025-04-25 11:07:15 +02:00
27bc60d1b9 feat: implement reconciliation loop (#17261)
Closes https://github.com/coder/internal/issues/510

<details>
<summary> Refactoring Summary </summary>

### 1) `CalculateActions` Function

#### Issues Before Refactoring:

- Large function (~150 lines), making it difficult to read and maintain.
- The control flow is hard to follow due to complex conditional logic.
- The `ReconciliationActions` struct was partially initialized early,
then mutated in multiple places, making the flow error-prone.

Original source:  

fe60b569ad/coderd/prebuilds/state.go (L13-L167)

#### Improvements After Refactoring:

- Simplified and broken down into smaller, focused helper methods.
- The flow of the function is now more linear and easier to understand.
- Struct initialization is cleaner, avoiding partial and incremental
mutations.

Refactored function:  

eeb0407d78/coderd/prebuilds/state.go (L67-L84)

---

### 2) `ReconciliationActions` Struct

#### Issues Before Refactoring:

- The struct mixed both actionable decisions and diagnostic state, which
blurred its purpose.
- It was unclear which fields were necessary for reconciliation logic,
and which were purely for logging/observability.

#### Improvements After Refactoring:

- Split into two clear, purpose-specific structs:
- **`ReconciliationActions`** — defines the intended reconciliation
action.
- **`ReconciliationState`** — captures runtime state and metadata,
primarily for logging and diagnostics.

Original struct:  

fe60b569ad/coderd/prebuilds/reconcile.go (L29-L41)

</details>

---------

Signed-off-by: Danny Kopping <dannykopping@gmail.com>
Co-authored-by: Sas Swart <sas.swart.cdk@gmail.com>
Co-authored-by: Danny Kopping <dannykopping@gmail.com>
Co-authored-by: Dean Sheather <dean@deansheather.com>
Co-authored-by: Spike Curtis <spike@coder.com>
Co-authored-by: Danny Kopping <danny@coder.com>
2025-04-17 09:29:29 -04:00