Closes https://github.com/coder/coder/issues/17988
Define `preset_hard_limited` metric which for every preset indicates
whether a given preset has reached the hard failure limit (1 for
hard-limited, 0 otherwise).
CLI example:
```
curl -X GET localhost:2118/metrics | grep preset_hard_limited
# HELP coderd_prebuilt_workspaces_preset_hard_limited Indicates whether a given preset has reached the hard failure limit (1 for hard-limited, 0 otherwise).
# TYPE coderd_prebuilt_workspaces_preset_hard_limited gauge
coderd_prebuilt_workspaces_preset_hard_limited{organization_name="coder",preset_name="GoLand: Large",template_name="Test7"} 1
coderd_prebuilt_workspaces_preset_hard_limited{organization_name="coder",preset_name="GoLand: Large",template_name="ValidTemplate"} 0
coderd_prebuilt_workspaces_preset_hard_limited{organization_name="coder",preset_name="IU: Medium",template_name="Test7"} 1
coderd_prebuilt_workspaces_preset_hard_limited{organization_name="coder",preset_name="IU: Medium",template_name="ValidTemplate"} 0
coderd_prebuilt_workspaces_preset_hard_limited{organization_name="coder",preset_name="WS: Small",template_name="Test7"} 1
```
NOTE:
```go
if !ps.Preset.Deleted && ps.Preset.UsingActiveVersion {
c.metrics.trackHardLimitedStatus(ps.Preset.OrganizationName, ps.Preset.TemplateName, ps.Preset.Name, ps.IsHardLimited)
}
```
Only active template version is tracked. If admin creates new template
version - old value of metric (for previous template version) will be
overwritten with new value of metric (for active template version).
Because `template_version` is not part of metric:
```go
labels = []string{"template_name", "preset_name", "organization_name"}
```
Implementation is similar to implementation of
`MetricResourceReplacementsCount` metric
---------
Co-authored-by: Susana Ferreira <ssncferreira@gmail.com>
Also add some clarification about the lack of database constraints for
soft template deletion.
---------
Signed-off-by: Danny Kopping <dannykopping@gmail.com>
Co-authored-by: Danny Kopping <dannykopping@gmail.com>
Closes https://github.com/coder/internal/issues/369
We can't know whether a replacement (i.e. drift of terraform state
leading to a resource needing to be deleted/recreated) will take place
apriori; we can only detect it at `plan` time, because the provider
decides whether a resource must be replaced and it cannot be inferred
through static analysis of the template.
**This is likely to be the most common gotcha with using prebuilds,
since it requires a slight template modification to use prebuilds
effectively**, so let's head this off before it's an issue for
customers.
Drift details will now be logged in the workspace build logs:

Plus a notification will be sent to template admins when this situation
arises:

A new metric - `coderd_prebuilt_workspaces_resource_replacements_total`
- will also increment each time a workspace encounters replacements.
We only track _that_ a resource replacement occurred, not how many. Just
one is enough to ruin a prebuild, but we can't know apriori which
replacement would cause this.
For example, say we have 2 replacements: a `docker_container` and a
`null_resource`; we don't know which one might
cause an issue (or indeed if either would), so we just track the
replacement.
---------
Signed-off-by: Danny Kopping <dannykopping@gmail.com>
`Collect()` is called whenever the `/metrics` endpoint is hit to
retrieve metrics.
The queries used in prebuilds metrics collection are quite heavy, and we
want to avoid having them running concurrently / too often to keep db
load down.
Here I'm moving towards a background retrieval of the state required to
set the metrics, which gets invalidated every interval.
Also introduces `coderd_prebuilt_workspaces_metrics_last_updated` which
operators can use to determine when these metrics go stale.
See https://github.com/coder/coder/pull/17789 as well.
---------
Signed-off-by: Danny Kopping <dannykopping@gmail.com>