Commit Graph

6 Commits

Author SHA1 Message Date
b330c0803c fix: reimplement reporting of preset-hard-limited metric (#18055)
Addresses concerns raised in https://github.com/coder/coder/pull/18045
2025-05-28 14:18:32 -04:00
2a15aa8a6f feat: add hard-limited presets metric (#18008)
Closes https://github.com/coder/coder/issues/17988

Define `preset_hard_limited` metric which for every preset indicates
whether a given preset has reached the hard failure limit (1 for
hard-limited, 0 otherwise).

CLI example:
```
curl -X GET localhost:2118/metrics | grep preset_hard_limited
# HELP coderd_prebuilt_workspaces_preset_hard_limited Indicates whether a given preset has reached the hard failure limit (1 for hard-limited, 0 otherwise).
# TYPE coderd_prebuilt_workspaces_preset_hard_limited gauge
coderd_prebuilt_workspaces_preset_hard_limited{organization_name="coder",preset_name="GoLand: Large",template_name="Test7"} 1
coderd_prebuilt_workspaces_preset_hard_limited{organization_name="coder",preset_name="GoLand: Large",template_name="ValidTemplate"} 0
coderd_prebuilt_workspaces_preset_hard_limited{organization_name="coder",preset_name="IU: Medium",template_name="Test7"} 1
coderd_prebuilt_workspaces_preset_hard_limited{organization_name="coder",preset_name="IU: Medium",template_name="ValidTemplate"} 0
coderd_prebuilt_workspaces_preset_hard_limited{organization_name="coder",preset_name="WS: Small",template_name="Test7"} 1
```

NOTE:
```go
if !ps.Preset.Deleted && ps.Preset.UsingActiveVersion {
	c.metrics.trackHardLimitedStatus(ps.Preset.OrganizationName, ps.Preset.TemplateName, ps.Preset.Name, ps.IsHardLimited)
}
```

Only active template version is tracked. If admin creates new template
version - old value of metric (for previous template version) will be
overwritten with new value of metric (for active template version).
Because `template_version` is not part of metric:
```go
labels = []string{"template_name", "preset_name", "organization_name"}
```

Implementation is similar to implementation of
`MetricResourceReplacementsCount` metric

---------

Co-authored-by: Susana Ferreira <ssncferreira@gmail.com>
2025-05-26 11:39:44 -04:00
2aa8cbebd7 fix: exclude deleted templates from metrics collection (#17839)
Also add some clarification about the lack of database constraints for
soft template deletion.

---------

Signed-off-by: Danny Kopping <dannykopping@gmail.com>
Co-authored-by: Danny Kopping <dannykopping@gmail.com>
2025-05-15 13:33:58 +02:00
6e967780c9 feat: track resource replacements when claiming a prebuilt workspace (#17571)
Closes https://github.com/coder/internal/issues/369

We can't know whether a replacement (i.e. drift of terraform state
leading to a resource needing to be deleted/recreated) will take place
apriori; we can only detect it at `plan` time, because the provider
decides whether a resource must be replaced and it cannot be inferred
through static analysis of the template.

**This is likely to be the most common gotcha with using prebuilds,
since it requires a slight template modification to use prebuilds
effectively**, so let's head this off before it's an issue for
customers.

Drift details will now be logged in the workspace build logs:


![image](https://github.com/user-attachments/assets/da1988b6-2cbe-4a79-a3c5-ea29891f3d6f)

Plus a notification will be sent to template admins when this situation
arises:


![image](https://github.com/user-attachments/assets/39d555b1-a262-4a3e-b529-03b9f23bf66a)

A new metric - `coderd_prebuilt_workspaces_resource_replacements_total`
- will also increment each time a workspace encounters replacements.

We only track _that_ a resource replacement occurred, not how many. Just
one is enough to ruin a prebuild, but we can't know apriori which
replacement would cause this.
For example, say we have 2 replacements: a `docker_container` and a
`null_resource`; we don't know which one might
cause an issue (or indeed if either would), so we just track the
replacement.

---------

Signed-off-by: Danny Kopping <dannykopping@gmail.com>
2025-05-14 14:52:22 +02:00
b2a1de9e2a feat: fetch prebuilds metrics state in background (#17792)
`Collect()` is called whenever the `/metrics` endpoint is hit to
retrieve metrics.

The queries used in prebuilds metrics collection are quite heavy, and we
want to avoid having them running concurrently / too often to keep db
load down.

Here I'm moving towards a background retrieval of the state required to
set the metrics, which gets invalidated every interval.

Also introduces `coderd_prebuilt_workspaces_metrics_last_updated` which
operators can use to determine when these metrics go stale.

See https://github.com/coder/coder/pull/17789 as well.

---------

Signed-off-by: Danny Kopping <dannykopping@gmail.com>
2025-05-13 20:27:41 +02:00
e0483e3136 feat: add prebuilds metrics collector (#17547)
Closes https://github.com/coder/internal/issues/509

---------

Signed-off-by: Danny Kopping <dannykopping@gmail.com>
2025-04-28 12:28:56 +02:00