docs: add new scaling doc to best practices section (#15904)

[preview](https://coder.com/docs/@bp-scaling-coder/tutorials/best-practices/scale-coder) --------- Co-authored-by: Spike Curtis <spike@coder.com>
2025-07-15 22:20:27 +00:00 · 2025-01-21 15:02:02 -05:00
parent 0fa6b3df13
commit 02d0650ae8
4 changed files with 393 additions and 39 deletions
--- a/docs/admin/infrastructure/scale-testing.md
+++ b/docs/admin/infrastructure/scale-testing.md
@ -5,7 +5,7 @@ without compromising service. This process encompasses infrastructure setup,
 traffic projections, and aggressive testing to identify and mitigate potential
 bottlenecks.

-A dedicated Kubernetes cluster for Coder is recommended to configure, host and
+A dedicated Kubernetes cluster for Coder is recommended to configure, host, and
 manage Coder workloads. Kubernetes provides container orchestration
 capabilities, allowing Coder to efficiently deploy, scale, and manage workspaces
 across a distributed infrastructure. This ensures high availability, fault
@ -13,27 +13,29 @@ tolerance, and scalability for Coder deployments. Coder is deployed on this
 cluster using the
 [Helm chart](../../install/kubernetes.md#4-install-coder-with-helm).

+For more information about scaling, see our [Coder scaling best practices](../../tutorials/best-practices/scale-coder.md).
+
 ## Methodology

 Our scale tests include the following stages:

 1. Prepare environment: create expected users and provision workspaces.

-2. SSH connections: establish user connections with agents, verifying their
+1. SSH connections: establish user connections with agents, verifying their
   ability to echo back received content.

-3. Web Terminal: verify the PTY connection used for communication with Web
+1. Web Terminal: verify the PTY connection used for communication with Web
   Terminal.

-4. Workspace application traffic: assess the handling of user connections with
+1. Workspace application traffic: assess the handling of user connections with
   specific workspace apps, confirming their capability to echo back received
   content effectively.

-5. Dashboard evaluation: verify the responsiveness and stability of Coder
+1. Dashboard evaluation: verify the responsiveness and stability of Coder
   dashboards under varying load conditions. This is achieved by simulating user
   interactions using instances of headless Chromium browsers.

-6. Cleanup: delete workspaces and users created in step 1.
+1. Cleanup: delete workspaces and users created in step 1.

 ## Infrastructure and setup requirements

@ -54,13 +56,16 @@ channel for IDEs with VS Code and JetBrains plugins.
 The basic setup of scale tests environment involves:

 1. Scale tests runner (32 vCPU, 128 GB RAM)
-2. Coder: 2 replicas (4 vCPU, 16 GB RAM)
-3. Database: 1 instance (2 vCPU, 32 GB RAM)
-4. Provisioner: 50 instances (0.5 vCPU, 512 MB RAM)
+1. Coder: 2 replicas (4 vCPU, 16 GB RAM)
+1. Database: 1 instance (2 vCPU, 32 GB RAM)
+1. Provisioner: 50 instances (0.5 vCPU, 512 MB RAM)

-The test is deemed successful if users did not experience interruptions in their
-workflows, `coderd` did not crash or require restarts, and no other internal
-errors were observed.
+The test is deemed successful if:
+
+- Users did not experience interruptions in their
+workflows,
+- `coderd` did not crash or require restarts, and
+- No other internal errors were observed.

 ## Traffic Projections

@ -90,11 +95,11 @@ Database:

 ## Available reference architectures

-[Up to 1,000 users](./validated-architectures/1k-users.md)
+- [Up to 1,000 users](./validated-architectures/1k-users.md)

-[Up to 2,000 users](./validated-architectures/2k-users.md)
+- [Up to 2,000 users](./validated-architectures/2k-users.md)

-[Up to 3,000 users](./validated-architectures/3k-users.md)
+- [Up to 3,000 users](./validated-architectures/3k-users.md)

 ## Hardware recommendation

@ -107,7 +112,7 @@ guidance on optimal configurations. A reasonable approach involves using scaling
 formulas based on factors like CPU, memory, and the number of users.

 While the minimum requirements specify 1 CPU core and 2 GB of memory per
-`coderd` replica, it is recommended to allocate additional resources depending
+`coderd` replica, we recommend that you allocate additional resources depending
 on the workload size to ensure deployment stability.

 #### CPU and memory usage
--- a/docs/admin/infrastructure/scale-utility.md
+++ b/docs/admin/infrastructure/scale-utility.md
@ -1,20 +1,23 @@
 # Scale Tests and Utilities

-We scale-test Coder with [a built-in utility](#scale-testing-utility) that can
+We scale-test Coder with a built-in utility that can
 be used in your environment for insights into how Coder scales with your
-infrastructure. For scale-testing Kubernetes clusters we recommend to install
+infrastructure. For scale-testing Kubernetes clusters we recommend that you install
 and use the dedicated Coder template,
 [scaletest-runner](https://github.com/coder/coder/tree/main/scaletest/templates/scaletest-runner).

 Learn more about [Coder’s architecture](./architecture.md) and our
 [scale-testing methodology](./scale-testing.md).

+For more information about scaling, see our [Coder scaling best practices](../../tutorials/best-practices/scale-coder.md).
+
 ## Recent scale tests

-> Note: the below information is for reference purposes only, and are not
-> intended to be used as guidelines for infrastructure sizing. Review the
-> [Reference Architectures](./validated-architectures/index.md#node-sizing) for
-> hardware sizing recommendations.
+The information in this doc is for reference purposes only, and is not intended
+to be used as guidelines for infrastructure sizing.
+
+Review the [Reference Architectures](./validated-architectures/index.md#node-sizing) for
+hardware sizing recommendations.

 | Environment      | Coder CPU | Coder RAM | Coder Replicas | Database          | Users | Concurrent builds | Concurrent connections (Terminal/SSH) | Coder Version | Last tested  |
 |------------------|-----------|-----------|----------------|-------------------|-------|-------------------|---------------------------------------|---------------|--------------|
@ -25,8 +28,7 @@ Learn more about [Coder’s architecture](./architecture.md) and our
 | Kubernetes (GKE) | 4 cores   | 16 GB     | 2              | db-custom-8-30720 | 2000  | 50                | 2000 simulated                        | `v2.8.4`      | Feb 28, 2024 |
 | Kubernetes (GKE) | 2 cores   | 4 GB      | 2              | db-custom-2-7680  | 1000  | 50                | 1000 simulated                        | `v2.10.2`     | Apr 26, 2024 |

-> Note: a simulated connection reads and writes random data at 40KB/s per
-> connection.
+> Note: A simulated connection reads and writes random data at 40KB/s per connection.

 ## Scale testing utility

@ -34,17 +36,24 @@ Since Coder's performance is highly dependent on the templates and workflows you
 support, you may wish to use our internal scale testing utility against your own
 environments.

-> Note: This utility is experimental. It is not subject to any compatibility
-> guarantees, and may cause interruptions for your users. To avoid potential
-> outages and orphaned resources, we recommend running scale tests on a
-> secondary "staging" environment or a dedicated
-> [Kubernetes playground cluster](https://github.com/coder/coder/tree/main/scaletest/terraform).
-> Run it against a production environment at your own risk.
+<blockquote class="admonition important">
+
+This utility is experimental.
+
+It is not subject to any compatibility guarantees and may cause interruptions
+for your users.
+To avoid potential outages and orphaned resources, we recommend that you run
+scale tests on a secondary "staging" environment or a dedicated
+[Kubernetes playground cluster](https://github.com/coder/coder/tree/main/scaletest/terraform).
+
+Run it against a production environment at your own risk.
+
+</blockquote>

 ### Create workspaces

 The following command will provision a number of Coder workspaces using the
-specified template and extra parameters.
+specified template and extra parameters:

 ```shell
 coder exp scaletest create-workspaces \
@ -56,8 +65,6 @@ coder exp scaletest create-workspaces \
        --job-timeout 5h \
        --no-cleanup \
        --output json:"${SCALETEST_RESULTS_DIR}/create-workspaces.json"
-
-# Run `coder exp scaletest create-workspaces --help` for all usage
 ```

 The command does the following:
@ -70,6 +77,12 @@ The command does the following:
 1. If you don't want the creation process to be interrupted by any errors, use
   the `--retry 5` flag.

+For more built-in `scaletest` options, use the `--help` flag:
+
+```shell
+coder exp scaletest create-workspaces --help
+```
+
 ### Traffic Generation

 Given an existing set of workspaces created previously with `create-workspaces`,
@ -105,7 +118,11 @@ The `workspace-traffic` supports also other modes - SSH traffic, workspace app:
 1. For SSH traffic: Use `--ssh` flag to generate SSH traffic instead of Web
   Terminal.
 1. For workspace app traffic: Use `--app [wsdi|wsec|wsra]` flag to select app
-   behavior. (modes: _WebSocket discard_, _WebSocket echo_, _WebSocket read_).
+   behavior.
+
+   - `wsdi`: WebSocket discard
+   - `wsec`: WebSocket echo
+   - `wsra`: WebSocket read

 ### Cleanup

--- a/docs/manifest.json
+++ b/docs/manifest.json
@ -243,6 +243,11 @@
 							"title": "Scaling Utilities",
 							"description": "Tools to help you scale your deployment",
 							"path": "./admin/infrastructure/scale-utility.md"
+						},
+						{
+							"title": "Scaling best practices",
+							"description": "How to prepare a Coder deployment for scale",
+							"path": "./tutorials/best-practices/scale-coder.md"
 						}
 					]
 				},
@ -761,16 +766,21 @@
 					"description": "Guides to help you make the most of your Coder experience",
 					"path": "./tutorials/best-practices/index.md",
 					"children": [
-						{
-							"title": "Security - best practices",
-							"description": "Make your Coder deployment more secure",
-							"path": "./tutorials/best-practices/security-best-practices.md"
-						},
 						{
 							"title": "Organizations - best practices",
 							"description": "How to make the best use of Coder Organizations",
 							"path": "./tutorials/best-practices/organizations.md"
 						},
+						{
+							"title": "Scale Coder",
+							"description": "How to prepare a Coder deployment for scale",
+							"path": "./tutorials/best-practices/scale-coder.md"
+						},
+						{
+							"title": "Security - best practices",
+							"description": "Make your Coder deployment more secure",
+							"path": "./tutorials/best-practices/security-best-practices.md"
+						},
 						{
 							"title": "Speed up your workspaces",
 							"description": "Speed up your Coder templates and workspaces",
--- a/docs/tutorials/best-practices/scale-coder.md
+++ b/docs/tutorials/best-practices/scale-coder.md
@ -0,0 +1,322 @@
+# Scale Coder
+
+This best practice guide helps you prepare a Coder deployment that you can
+scale up to a high-scale deployment as use grows, and keep it operating smoothly with a
+high number of active users and workspaces.
+
+## Observability
+
+Observability is one of the most important aspects to a scalable Coder deployment.
+When you have visibility into performance and usage metrics, you can make informed
+decisions about what changes you should make.
+
+[Monitor your Coder deployment](../../admin/monitoring/index.md) with log output
+and metrics to identify potential bottlenecks before they negatively affect the
+end-user experience and measure the effects of modifications you make to your
+deployment.
+
+- Log output
+  - Capture log output from from Coder Server instances and external provisioner daemons
+  and store them in a searchable log store like Loki, CloudWatch logs, or other tools.
+  - Retain logs for a minimum of thirty days, ideally ninety days.
+  This allows you investigate when anomalous behaviors began.
+
+- Metrics
+  - Capture infrastructure metrics like CPU, memory, open files, and network I/O for all
+  Coder Server, external provisioner daemon, workspace proxy, and PostgreSQL instances.
+  - Capture Coder Server and External Provisioner daemons metrics
+  [via Prometheus](#how-to-capture-coder-server-metrics-with-prometheus).
+
+Retain metric time series for at least six months. This allows you to see
+performance trends relative to user growth.
+
+For a more comprehensive overview, integrate metrics with an observability
+dashboard like [Grafana](../../admin/monitoring/index.md).
+
+### Observability key metrics
+
+Configure alerting based on these metrics to ensure you surface problems before
+they affect the end-user experience.
+
+- CPU and Memory Utilization
+  - Monitor the utilization as a fraction of the available resources on the instance.
+
+     Utilization will vary with use throughout the course of a day, week, and longer timelines.
+     Monitor trends and pay special attention to the daily and weekly peak utilization.
+     Use long-term trends to plan infrastructure upgrades.
+
+- Tail latency of Coder Server API requests
+  - High tail latency can indicate Coder Server or the PostgreSQL database is underprovisioned
+  for the load.
+  - Use the `coderd_api_request_latencies_seconds` metric.
+
+- Tail latency of database queries
+  - High tail latency can indicate the PostgreSQL database is low in resources.
+  - Use the `coderd_db_query_latencies_seconds` metric.
+
+### How to capture Coder server metrics with Prometheus
+
+Edit your Helm `values.yaml` to capture metrics from Coder Server and external provisioner daemons with
+[Prometheus](../../admin/integrations/prometheus.md):
+
+1. Enable Prometheus metrics:
+
+   ```yaml
+   CODER_PROMETHEUS_ENABLE=true
+   ```
+
+1. Enable database metrics:
+
+   ```yaml
+   CODER_PROMETHEUS_COLLECT_DB_METRICS=true
+   ```
+
+1. For a high scale deployment, configure agent stats to avoid large cardinality or disable them:
+
+   - Configure agent stats:
+
+     ```yaml
+     CODER_PROMETHEUS_AGGREGATE_AGENT_STATS_BY=agent_name
+     ```
+
+   - Disable agent stats:
+
+     ```yaml
+     CODER_PROMETHEUS_COLLECT_AGENT_STATS=false
+     ```
+
+## Coder Server
+
+### Locality
+
+If increased availability of the Coder API is a concern, deploy at least three
+instances of Coder Server. Spread the instances across nodes with anti-affinity rules in
+Kubernetes or in different availability zones of the same geographic region.
+
+Do not deploy in different geographic regions.
+
+Coder Servers need to be able to communicate with one another directly with low
+latency, under 10ms. Note that this is for the availability of the Coder API.
+Workspaces are not fault tolerant unless they are explicitly built that way at
+the template level.
+
+Deploy Coder Server instances as geographically close to PostgreSQL as possible.
+Low-latency communication (under 10ms) with Postgres is essential for Coder
+Server's performance.
+
+### Scaling
+
+Coder Server can be scaled both vertically for bigger instances and horizontally
+for more instances.
+
+Aim to keep the number of Coder Server instances relatively small, preferably
+under ten instances, and opt for vertical scale over horizontal scale after
+meeting availability requirements.
+
+Coder's
+[validated architectures](../../admin/infrastructure/validated-architectures/index.md)
+give specific sizing recommendations for various user scales. These are a useful
+starting point, but very few deployments will remain stable at a predetermined
+user level over the long term. We recommend monitoring and adjusting resources as needed.
+
+We don't recommend that you autoscale the Coder Servers. Instead, scale the
+deployment for peak weekly usage.
+
+Although Coder Server persists no internal state, it operates as a proxy for end
+users to their workspaces in two capacities:
+
+1. As an HTTP proxy when they access workspace applications in their browser via
+the Coder Dashboard.
+
+1. As a DERP proxy when establishing tunneled connections with CLI tools like
+`coder ssh`, `coder port-forward`, and others, and with desktop IDEs.
+
+Stopping a Coder Server instance will (momentarily) disconnect any users
+currently connecting through that instance. Adding a new instance is not
+disruptive, but you should remove instances and perform upgrades during a
+maintenance window to minimize disruption.
+
+## Provisioner daemons
+
+### Locality
+
+We recommend that you run one or more
+[provisioner daemon deployments external to Coder Server](../../admin/provisioners.md)
+and disable provisioner daemons within your Coder Server.
+This allows you to scale them independently of the Coder Server:
+
+```yaml
+CODER_PROVISIONER_DAEMONS=0
+```
+
+We recommend deploying provisioner daemons within the same cluster as the
+workspaces they will provision or are hosted in.
+
+- This gives them a low-latency connection to the APIs they will use to
+  provision workspaces and can speed builds.
+
+- It allows provisioner daemons to use in-cluster mechanisms (for example
+  Kubernetes service account tokens, AWS IAM Roles, and others) to authenticate with
+  the infrastructure APIs.
+
+- If you deploy workspaces in multiple clusters, run multiple provisioner daemon
+  deployments and use template tags to select the correct set of provisioner
+  daemons.
+
+- Provisioner daemons need to be able to connect to Coder Server, but this does not need
+  to be a low-latency connection.
+
+Provisioner daemons make no direct connections to the PostgreSQL database, so
+there's no need for locality to the Postgres database.
+
+### Scaling
+
+Each provisioner daemon instance can handle a single workspace build job at a
+time. Therefore, the maximum number of simultaneous builds your Coder deployment
+can handle is equal to the number of provisioner daemon instances within a tagged
+deployment.
+
+If users experience unacceptably long queues for workspace builds to start,
+consider increasing the number of provisioner daemon instances in the affected
+cluster.
+
+You might need to automatically scale the number of provisioner daemon instances
+throughout the day to meet demand.
+
+If you stop instances with `SIGHUP`, they will complete their current build job
+and exit. `SIGINT` will cancel the current job, which will result in a failed build.
+Ensure your autoscaler waits long enough for your build jobs to complete before
+it kills the provisioner daemon process.
+
+If you deploy in Kubernetes, we recommend a single provisioner daemon per pod.
+On a virtual machine (VM), you can deploy multiple provisioner daemons, ensuring
+each has a unique `CODER_CACHE_DIRECTORY` value.
+
+Coder's
+[validated architectures](../../admin/infrastructure/validated-architectures/index.md)
+give specific sizing recommendations for various user scales. Since the
+complexity of builds varies significantly depending on the workspace template,
+consider this a starting point. Monitor queue times and build times and adjust
+the number and size of your provisioner daemon instances.
+
+## PostgreSQL
+
+PostgreSQL is the primary persistence layer for all of Coder's deployment data.
+We also use `LISTEN` and `NOTIFY` to coordinate between different instances of
+Coder Server.
+
+### Locality
+
+Coder Server instances must have low-latency connections (under 10ms) to
+PostgreSQL. If you use multiple PostgreSQL replicas in a clustered config, these
+must also be low-latency with respect to one another.
+
+### Scaling
+
+Prefer scaling PostgreSQL vertically rather than horizontally for best
+performance. Coder's
+[validated architectures](../../admin/infrastructure/validated-architectures/index.md)
+give specific sizing recommendations for various user scales.
+
+## Workspace proxies
+
+Workspace proxies proxy HTTP traffic from end users to workspaces for Coder apps
+defined in the templates, and HTTP ports opened by the workspace. By default
+they also include a DERP Proxy.
+
+### Locality
+
+We recommend each geographic cluster of workspaces have an associated deployment
+of workspace proxies. This ensures that users always have a near-optimal proxy
+path.
+
+### Scaling
+
+Workspace proxy load is determined by the amount of traffic they proxy.
+
+Monitor CPU, memory, and network I/O utilization to decide when to resize
+the number of proxy instances.
+
+Scale for peak demand and scale down or upgrade during a maintenance window.
+
+We do not recommend autoscaling the workspace proxies because many applications
+use long-lived connections such as websockets, which would be disrupted by
+stopping the proxy.
+
+## Workspaces
+
+Workspaces represent the vast majority of resources in most Coder deployments.
+Because they are defined by templates, there is no one-size-fits-all advice for
+scaling workspaces.
+
+### Hard and soft cluster limits
+
+All Infrastructure as a Service (IaaS) clusters have limits to what can be
+simultaneously provisioned. These could be hard limits, based on the physical
+size of the cluster, especially in the case of a private cloud, or soft limits,
+based on configured limits in your public cloud account.
+
+It is important to be aware of these limits and monitor Coder workspace resource
+utilization against the limits, so that a new influx of users don't encounter
+failed builds. Monitoring these is outside the scope of Coder, but we recommend
+that you set up dashboards and alerts for each kind of limited resource.
+
+As you approach soft limits, you can request limit increases to keep growing.
+
+As you approach hard limits, consider deploying to additional cluster(s).
+
+### Workspaces per node
+
+Many development workloads are "spiky" in their CPU and memory requirements, for
+example, they peak during build/test and then lower while editing code.
+This leads to an opportunity to efficiently use compute resources by packing multiple
+workspaces onto a single node. This can lead to better experience (more CPU and
+memory available during brief bursts) and lower cost.
+
+There are a number of things you should consider before you decide how many
+workspaces you should allow per node:
+
+- "Noisy neighbor" issues: Users share the node's CPU and memory resources and might
+be susceptible to a user or process consuming shared resources.
+
+- If the shared nodes are a provisioned resource, for example, Kubernetes nodes
+  running on VMs in a public cloud, then it can sometimes be a challenge to
+  effectively autoscale down.
+
+  - For example, if half the workspaces are stopped overnight, and there are ten
+    workspaces per node, it's unlikely that all ten workspaces on the node are
+    among the stopped ones.
+
+  - You can mitigate this by lowering the number of workspaces per node, or
+    using autostop policies to stop more workspaces during off-peak hours.
+
+- If you do overprovision workspaces onto nodes, keep them in a separate node
+  pool and schedule Coder control plane (Coder Server, PostgreSQL, workspace
+  proxies) components on a different node pool to avoid resource spikes
+  affecting them.
+
+Coder customers have had success with both:
+
+- One workspace per AWS VM
+- Lots of workspaces on Kubernetes nodes for efficiency
+
+### Cost control
+
+- Use quotas to discourage users from creating many workspaces they don't need
+  simultaneously.
+
+- Label workspace cloud resources by user, team, organization, or your own
+  labelling conventions to track usage at different granularities.
+
+- Use autostop requirements to bring off-peak utilization down.
+
+## Networking
+
+Set up your network so that most users can get direct, peer-to-peer connections
+to their workspaces. This drastically reduces the load on Coder Server and
+workspace proxy instances.
+
+## Next steps
+
+- [Scale Tests and Utilities](../../admin/infrastructure/scale-utility.md)
+- [Scale Testing](../../admin/infrastructure/scale-testing.md)