docs: add oom/ood to notifications (#16582)

- [x] add section or to another section: where the notifications show
up/how to access

previews:
- [Notifications - Configure OOM/OOD
notifications](https://coder.com/docs/@16581-oom-ood-notif/admin/monitoring/notifications#configure-oomood-notifications)
- [Resource
monitoring](https://coder.com/docs/@16581-oom-ood-notif/admin/templates/extending-templates/resource-monitoring)

---------

Co-authored-by: EdwardAngert <17991901+EdwardAngert@users.noreply.github.com>
This commit is contained in:
Edward Angert
2025-03-05 03:43:08 -06:00
committed by GitHub
parent edf28895c7
commit 9251e0d642
3 changed files with 67 additions and 3 deletions

View File

@@ -29,14 +29,14 @@ These notifications are sent to the workspace owner:
### User Events ### User Events
These notifications sent to users with **owner** and **user admin** roles: These notifications are sent to users with **owner** and **user admin** roles:
- User account created - User account created
- User account deleted - User account deleted
- User account suspended - User account suspended
- User account activated - User account activated
These notifications sent to users themselves: These notifications are sent to users themselves:
- User account suspended - User account suspended
- User account activated - User account activated
@@ -48,6 +48,8 @@ These notifications are sent to users with **template admin** roles:
- Template deleted - Template deleted
- Template deprecated - Template deprecated
- Out of memory (OOM) / Out of disk (OOD)
- [Configure](#configure-oomood-notifications) in the template `main.tf`.
- Report: Workspace builds failed for template - Report: Workspace builds failed for template
- This notification is delivered as part of a weekly cron job and summarizes - This notification is delivered as part of a weekly cron job and summarizes
the failed builds for a given template. the failed builds for a given template.
@@ -63,6 +65,16 @@ flags.
| ✔️ | `--notifications-method` | `CODER_NOTIFICATIONS_METHOD` | `string` | Which delivery method to use (available options: 'smtp', 'webhook'). See [Delivery Methods](#delivery-methods) below. | smtp | | ✔️ | `--notifications-method` | `CODER_NOTIFICATIONS_METHOD` | `string` | Which delivery method to use (available options: 'smtp', 'webhook'). See [Delivery Methods](#delivery-methods) below. | smtp |
| - | `--notifications-max-send-attempts` | `CODER_NOTIFICATIONS_MAX_SEND_ATTEMPTS` | `int` | The upper limit of attempts to send a notification. | 5 | | - | `--notifications-max-send-attempts` | `CODER_NOTIFICATIONS_MAX_SEND_ATTEMPTS` | `int` | The upper limit of attempts to send a notification. | 5 |
### Configure OOM/OOD notifications
You can monitor out of memory (OOM) and out of disk (OOD) errors and alert users
when they overutilize memory and disk.
This can help prevent agent disconnects due to OOM/OOD issues.
To enable OOM/OOD notifications on a template, follow the steps in the
[resource monitoring guide](../../templates/extending-templates/resource-monitoring.md).
## Delivery Methods ## Delivery Methods
Notifications can currently be delivered by either SMTP or webhook. Each message Notifications can currently be delivered by either SMTP or webhook. Each message
@@ -135,7 +147,7 @@ for more options.
After setting the required fields above: After setting the required fields above:
1. Setup an account on Microsoft 365 or outlook.com 1. Set up an account on Microsoft 365 or outlook.com
1. Set the following configuration options: 1. Set the following configuration options:
```text ```text

View File

@@ -0,0 +1,47 @@
# Resource monitoring
Use the
[`resources_monitoring`](https://registry.terraform.io/providers/coder/coder/latest/docs/resources/agent#resources_monitoring-1)
block on the
[`coder_agent`](https://registry.terraform.io/providers/coder/coder/latest/docs/resources/agent)
resource in our Terraform provider to monitor out of memory (OOM) and out of
disk (OOD) errors and alert users when they overutilize memory and disk.
This can help prevent agent disconnects due to OOM/OOD issues.
You can specify one or more volumes to monitor for OOD alerts.
OOM alerts are reported per-agent.
## Prerequisites
Notifications are sent through SMTP.
Configure Coder to [use an SMTP server](../../monitoring/notifications/index.md#smtp-email).
## Example
Add the following example to the template's `main.tf`.
Change the `90`, `80`, and `95` to a threshold that's more appropriate for your
deployment:
```hcl
resource "coder_agent" "main" {
arch = data.coder_provisioner.dev.arch
os = data.coder_provisioner.dev.os
resources_monitoring {
memory {
enabled = true
threshold = 90
}
volume {
path = "/volume1"
enabled = true
threshold = 80
}
volume {
path = "/volume2"
enabled = true
threshold = 95
}
}
}
```

View File

@@ -401,6 +401,11 @@
"description": "Display resource state in the workspace dashboard", "description": "Display resource state in the workspace dashboard",
"path": "./admin/templates/extending-templates/resource-metadata.md" "path": "./admin/templates/extending-templates/resource-metadata.md"
}, },
{
"title": "Resource Monitoring",
"description": "Monitor resources in the workspace dashboard",
"path": "./admin/templates/extending-templates/resource-monitoring.md"
},
{ {
"title": "Resource Ordering", "title": "Resource Ordering",
"description": "Design the UI of workspaces", "description": "Design the UI of workspaces",