Relates to https://github.com/coder/coder/issues/17432
### Part 1:
Notes:
- `GetPresetsAtFailureLimit` SQL query is added, which is similar to
`GetPresetsBackoff`, they use same CTEs: `filtered_builds`,
`time_sorted_builds`, but they are still different.
- Query is executed on every loop iteration. We can consider marking
specific preset as permanently failed as an optimization to avoid
executing query on every loop iteration. But I decided don't do it for
now.
- By default `FailureHardLimit` is set to 3.
- `FailureHardLimit` is configurable. Setting it to zero - means that
hard limit is disabled.
### Part 2
Notes:
- `PrebuildFailureLimitReached` notification is added.
- Notification is sent to template admins.
- Notification is sent only the first time, when hard limit is reached.
But it will `log.Warn` on every loop iteration.
- I introduced this enum:
```sql
CREATE TYPE prebuild_status AS ENUM (
'normal', -- Prebuilds are working as expected; this is the default, healthy state.
'hard_limited', -- Prebuilds have failed repeatedly and hit the configured hard failure limit; won't be retried anymore.
'validation_failed' -- Prebuilds failed due to a non-retryable validation error (e.g. template misconfiguration); won't be retried.
);
```
`validation_failed` not used in this PR, but I think it will be used in
next one, so I wanted to save us an extra migration.
- Notification looks like this:
<img width="472" alt="image"
src="https://github.com/user-attachments/assets/e10efea0-1790-4e7f-a65c-f94c40fced27"
/>
### Latest notification views:
<img width="463" alt="image"
src="https://github.com/user-attachments/assets/11310c58-68d1-4075-a497-f76d854633fe"
/>
<img width="725" alt="image"
src="https://github.com/user-attachments/assets/6bbfe21a-91ac-47c3-a9d1-21807bb0c53a"
/>
Closes https://github.com/coder/internal/issues/369
We can't know whether a replacement (i.e. drift of terraform state
leading to a resource needing to be deleted/recreated) will take place
apriori; we can only detect it at `plan` time, because the provider
decides whether a resource must be replaced and it cannot be inferred
through static analysis of the template.
**This is likely to be the most common gotcha with using prebuilds,
since it requires a slight template modification to use prebuilds
effectively**, so let's head this off before it's an issue for
customers.
Drift details will now be logged in the workspace build logs:

Plus a notification will be sent to template admins when this situation
arises:

A new metric - `coderd_prebuilt_workspaces_resource_replacements_total`
- will also increment each time a workspace encounters replacements.
We only track _that_ a resource replacement occurred, not how many. Just
one is enough to ruin a prebuild, but we can't know apriori which
replacement would cause this.
For example, say we have 2 replacements: a `docker_container` and a
`null_resource`; we don't know which one might
cause an issue (or indeed if either would), so we just track the
replacement.
---------
Signed-off-by: Danny Kopping <dannykopping@gmail.com>
Relates to https://github.com/coder/coder/issues/15845
When the `/workspace/<name>/builds` endpoint is hit, we check if the
requested template version is different to the previously used template
version. If these values differ, we can assume that the workspace has
been manually updated and send the appropriate notification. Automatic
updates happen in the lifecycle executor and bypasses this endpoint
entirely.
Relates to https://github.com/coder/coder/issues/14232
This implements two endpoints (names subject to change):
- `/api/v2/users/otp/request`
- `/api/v2/users/otp/change-password`