feat: track resource replacements when claiming a prebuilt workspace (#17571)

Closes https://github.com/coder/internal/issues/369

We can't know whether a replacement (i.e. drift of terraform state
leading to a resource needing to be deleted/recreated) will take place
apriori; we can only detect it at `plan` time, because the provider
decides whether a resource must be replaced and it cannot be inferred
through static analysis of the template.

**This is likely to be the most common gotcha with using prebuilds,
since it requires a slight template modification to use prebuilds
effectively**, so let's head this off before it's an issue for
customers.

Drift details will now be logged in the workspace build logs:


![image](https://github.com/user-attachments/assets/da1988b6-2cbe-4a79-a3c5-ea29891f3d6f)

Plus a notification will be sent to template admins when this situation
arises:


![image](https://github.com/user-attachments/assets/39d555b1-a262-4a3e-b529-03b9f23bf66a)

A new metric - `coderd_prebuilt_workspaces_resource_replacements_total`
- will also increment each time a workspace encounters replacements.

We only track _that_ a resource replacement occurred, not how many. Just
one is enough to ruin a prebuild, but we can't know apriori which
replacement would cause this.
For example, say we have 2 replacements: a `docker_container` and a
`null_resource`; we don't know which one might
cause an issue (or indeed if either would), so we just track the
replacement.

---------

Signed-off-by: Danny Kopping <dannykopping@gmail.com>
This commit is contained in:
Danny Kopping
2025-05-14 14:52:22 +02:00
committed by GitHub
parent e75d1c1ce5
commit 6e967780c9
33 changed files with 2048 additions and 969 deletions

View File

@ -7,6 +7,7 @@ import (
"golang.org/x/xerrors"
"github.com/coder/coder/v2/coderd/database"
sdkproto "github.com/coder/coder/v2/provisionersdk/proto"
)
var (
@ -27,6 +28,11 @@ type ReconciliationOrchestrator interface {
// Stop gracefully shuts down the orchestrator with the given cause.
// The cause is used for logging and error reporting.
Stop(ctx context.Context, cause error)
// TrackResourceReplacement handles a pathological situation whereby a terraform resource is replaced due to drift,
// which can obviate the whole point of pre-provisioning a prebuilt workspace.
// See more detail at https://coder.com/docs/admin/templates/extending-templates/prebuilt-workspaces#preventing-resource-replacement.
TrackResourceReplacement(ctx context.Context, workspaceID, buildID uuid.UUID, replacements []*sdkproto.ResourceReplacement)
}
type Reconciler interface {

View File

@ -6,12 +6,15 @@ import (
"github.com/google/uuid"
"github.com/coder/coder/v2/coderd/database"
sdkproto "github.com/coder/coder/v2/provisionersdk/proto"
)
type NoopReconciler struct{}
func (NoopReconciler) Run(context.Context) {}
func (NoopReconciler) Stop(context.Context, error) {}
func (NoopReconciler) Run(context.Context) {}
func (NoopReconciler) Stop(context.Context, error) {}
func (NoopReconciler) TrackResourceReplacement(context.Context, uuid.UUID, uuid.UUID, []*sdkproto.ResourceReplacement) {
}
func (NoopReconciler) ReconcileAll(context.Context) error { return nil }
func (NoopReconciler) SnapshotState(context.Context, database.Store) (*GlobalSnapshot, error) {
return &GlobalSnapshot{}, nil