mirror of
https://github.com/coder/coder.git
synced 2025-07-12 00:14:10 +00:00
fix: reduce cost of prebuild failure (#17697)
Relates to https://github.com/coder/coder/issues/17432 ### Part 1: Notes: - `GetPresetsAtFailureLimit` SQL query is added, which is similar to `GetPresetsBackoff`, they use same CTEs: `filtered_builds`, `time_sorted_builds`, but they are still different. - Query is executed on every loop iteration. We can consider marking specific preset as permanently failed as an optimization to avoid executing query on every loop iteration. But I decided don't do it for now. - By default `FailureHardLimit` is set to 3. - `FailureHardLimit` is configurable. Setting it to zero - means that hard limit is disabled. ### Part 2 Notes: - `PrebuildFailureLimitReached` notification is added. - Notification is sent to template admins. - Notification is sent only the first time, when hard limit is reached. But it will `log.Warn` on every loop iteration. - I introduced this enum: ```sql CREATE TYPE prebuild_status AS ENUM ( 'normal', -- Prebuilds are working as expected; this is the default, healthy state. 'hard_limited', -- Prebuilds have failed repeatedly and hit the configured hard failure limit; won't be retried anymore. 'validation_failed' -- Prebuilds failed due to a non-retryable validation error (e.g. template misconfiguration); won't be retried. ); ``` `validation_failed` not used in this PR, but I think it will be used in next one, so I wanted to save us an extra migration. - Notification looks like this: <img width="472" alt="image" src="https://github.com/user-attachments/assets/e10efea0-1790-4e7f-a65c-f94c40fced27" /> ### Latest notification views: <img width="463" alt="image" src="https://github.com/user-attachments/assets/11310c58-68d1-4075-a497-f76d854633fe" /> <img width="725" alt="image" src="https://github.com/user-attachments/assets/6bbfe21a-91ac-47c3-a9d1-21807bb0c53a" />
This commit is contained in:
committed by
GitHub
parent
e1934fe119
commit
53e8e9c7cd
1
docs/reference/api/general.md
generated
1
docs/reference/api/general.md
generated
@ -533,6 +533,7 @@ curl -X GET http://coder-server:8080/api/v2/deployment/config \
|
||||
"wildcard_access_url": "string",
|
||||
"workspace_hostname_suffix": "string",
|
||||
"workspace_prebuilds": {
|
||||
"failure_hard_limit": 0,
|
||||
"reconciliation_backoff_interval": 0,
|
||||
"reconciliation_backoff_lookback": 0,
|
||||
"reconciliation_interval": 0
|
||||
|
14
docs/reference/api/schemas.md
generated
14
docs/reference/api/schemas.md
generated
@ -2704,6 +2704,7 @@ CreateWorkspaceRequest provides options for creating a new workspace. Only one o
|
||||
"wildcard_access_url": "string",
|
||||
"workspace_hostname_suffix": "string",
|
||||
"workspace_prebuilds": {
|
||||
"failure_hard_limit": 0,
|
||||
"reconciliation_backoff_interval": 0,
|
||||
"reconciliation_backoff_lookback": 0,
|
||||
"reconciliation_interval": 0
|
||||
@ -3202,6 +3203,7 @@ CreateWorkspaceRequest provides options for creating a new workspace. Only one o
|
||||
"wildcard_access_url": "string",
|
||||
"workspace_hostname_suffix": "string",
|
||||
"workspace_prebuilds": {
|
||||
"failure_hard_limit": 0,
|
||||
"reconciliation_backoff_interval": 0,
|
||||
"reconciliation_backoff_lookback": 0,
|
||||
"reconciliation_interval": 0
|
||||
@ -5261,6 +5263,7 @@ Git clone makes use of this by parsing the URL from: 'Username for "https://gith
|
||||
|
||||
```json
|
||||
{
|
||||
"failure_hard_limit": 0,
|
||||
"reconciliation_backoff_interval": 0,
|
||||
"reconciliation_backoff_lookback": 0,
|
||||
"reconciliation_interval": 0
|
||||
@ -5269,11 +5272,12 @@ Git clone makes use of this by parsing the URL from: 'Username for "https://gith
|
||||
|
||||
### Properties
|
||||
|
||||
| Name | Type | Required | Restrictions | Description |
|
||||
|-----------------------------------|---------|----------|--------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
||||
| `reconciliation_backoff_interval` | integer | false | | Reconciliation backoff interval specifies the amount of time to increase the backoff interval when errors occur during reconciliation. |
|
||||
| `reconciliation_backoff_lookback` | integer | false | | Reconciliation backoff lookback determines the time window to look back when calculating the number of failed prebuilds, which influences the backoff strategy. |
|
||||
| `reconciliation_interval` | integer | false | | Reconciliation interval defines how often the workspace prebuilds state should be reconciled. |
|
||||
| Name | Type | Required | Restrictions | Description |
|
||||
|-----------------------------------|---------|----------|--------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
||||
| `failure_hard_limit` | integer | false | | Failure hard limit defines the maximum number of consecutive failed prebuild attempts allowed before a preset is considered to be in a hard limit state. When a preset hits this limit, no new prebuilds will be created until the limit is reset. FailureHardLimit is disabled when set to zero. |
|
||||
| `reconciliation_backoff_interval` | integer | false | | Reconciliation backoff interval specifies the amount of time to increase the backoff interval when errors occur during reconciliation. |
|
||||
| `reconciliation_backoff_lookback` | integer | false | | Reconciliation backoff lookback determines the time window to look back when calculating the number of failed prebuilds, which influences the backoff strategy. |
|
||||
| `reconciliation_interval` | integer | false | | Reconciliation interval defines how often the workspace prebuilds state should be reconciled. |
|
||||
|
||||
## codersdk.Preset
|
||||
|
||||
|
Reference in New Issue
Block a user