docs: add new scaling doc to best practices section (#15904)

[preview](https://coder.com/docs/@bp-scaling-coder/tutorials/best-practices/scale-coder)

---------

Co-authored-by: Spike Curtis <spike@coder.com>
This commit is contained in:
Edward Angert
2025-01-21 15:02:02 -05:00
committed by GitHub
parent 0fa6b3df13
commit 02d0650ae8
4 changed files with 393 additions and 39 deletions

View File

@ -5,7 +5,7 @@ without compromising service. This process encompasses infrastructure setup,
traffic projections, and aggressive testing to identify and mitigate potential traffic projections, and aggressive testing to identify and mitigate potential
bottlenecks. bottlenecks.
A dedicated Kubernetes cluster for Coder is recommended to configure, host and A dedicated Kubernetes cluster for Coder is recommended to configure, host, and
manage Coder workloads. Kubernetes provides container orchestration manage Coder workloads. Kubernetes provides container orchestration
capabilities, allowing Coder to efficiently deploy, scale, and manage workspaces capabilities, allowing Coder to efficiently deploy, scale, and manage workspaces
across a distributed infrastructure. This ensures high availability, fault across a distributed infrastructure. This ensures high availability, fault
@ -13,27 +13,29 @@ tolerance, and scalability for Coder deployments. Coder is deployed on this
cluster using the cluster using the
[Helm chart](../../install/kubernetes.md#4-install-coder-with-helm). [Helm chart](../../install/kubernetes.md#4-install-coder-with-helm).
For more information about scaling, see our [Coder scaling best practices](../../tutorials/best-practices/scale-coder.md).
## Methodology ## Methodology
Our scale tests include the following stages: Our scale tests include the following stages:
1. Prepare environment: create expected users and provision workspaces. 1. Prepare environment: create expected users and provision workspaces.
2. SSH connections: establish user connections with agents, verifying their 1. SSH connections: establish user connections with agents, verifying their
ability to echo back received content. ability to echo back received content.
3. Web Terminal: verify the PTY connection used for communication with Web 1. Web Terminal: verify the PTY connection used for communication with Web
Terminal. Terminal.
4. Workspace application traffic: assess the handling of user connections with 1. Workspace application traffic: assess the handling of user connections with
specific workspace apps, confirming their capability to echo back received specific workspace apps, confirming their capability to echo back received
content effectively. content effectively.
5. Dashboard evaluation: verify the responsiveness and stability of Coder 1. Dashboard evaluation: verify the responsiveness and stability of Coder
dashboards under varying load conditions. This is achieved by simulating user dashboards under varying load conditions. This is achieved by simulating user
interactions using instances of headless Chromium browsers. interactions using instances of headless Chromium browsers.
6. Cleanup: delete workspaces and users created in step 1. 1. Cleanup: delete workspaces and users created in step 1.
## Infrastructure and setup requirements ## Infrastructure and setup requirements
@ -54,13 +56,16 @@ channel for IDEs with VS Code and JetBrains plugins.
The basic setup of scale tests environment involves: The basic setup of scale tests environment involves:
1. Scale tests runner (32 vCPU, 128 GB RAM) 1. Scale tests runner (32 vCPU, 128 GB RAM)
2. Coder: 2 replicas (4 vCPU, 16 GB RAM) 1. Coder: 2 replicas (4 vCPU, 16 GB RAM)
3. Database: 1 instance (2 vCPU, 32 GB RAM) 1. Database: 1 instance (2 vCPU, 32 GB RAM)
4. Provisioner: 50 instances (0.5 vCPU, 512 MB RAM) 1. Provisioner: 50 instances (0.5 vCPU, 512 MB RAM)
The test is deemed successful if users did not experience interruptions in their The test is deemed successful if:
workflows, `coderd` did not crash or require restarts, and no other internal
errors were observed. - Users did not experience interruptions in their
workflows,
- `coderd` did not crash or require restarts, and
- No other internal errors were observed.
## Traffic Projections ## Traffic Projections
@ -90,11 +95,11 @@ Database:
## Available reference architectures ## Available reference architectures
[Up to 1,000 users](./validated-architectures/1k-users.md) - [Up to 1,000 users](./validated-architectures/1k-users.md)
[Up to 2,000 users](./validated-architectures/2k-users.md) - [Up to 2,000 users](./validated-architectures/2k-users.md)
[Up to 3,000 users](./validated-architectures/3k-users.md) - [Up to 3,000 users](./validated-architectures/3k-users.md)
## Hardware recommendation ## Hardware recommendation
@ -107,7 +112,7 @@ guidance on optimal configurations. A reasonable approach involves using scaling
formulas based on factors like CPU, memory, and the number of users. formulas based on factors like CPU, memory, and the number of users.
While the minimum requirements specify 1 CPU core and 2 GB of memory per While the minimum requirements specify 1 CPU core and 2 GB of memory per
`coderd` replica, it is recommended to allocate additional resources depending `coderd` replica, we recommend that you allocate additional resources depending
on the workload size to ensure deployment stability. on the workload size to ensure deployment stability.
#### CPU and memory usage #### CPU and memory usage

View File

@ -1,20 +1,23 @@
# Scale Tests and Utilities # Scale Tests and Utilities
We scale-test Coder with [a built-in utility](#scale-testing-utility) that can We scale-test Coder with a built-in utility that can
be used in your environment for insights into how Coder scales with your be used in your environment for insights into how Coder scales with your
infrastructure. For scale-testing Kubernetes clusters we recommend to install infrastructure. For scale-testing Kubernetes clusters we recommend that you install
and use the dedicated Coder template, and use the dedicated Coder template,
[scaletest-runner](https://github.com/coder/coder/tree/main/scaletest/templates/scaletest-runner). [scaletest-runner](https://github.com/coder/coder/tree/main/scaletest/templates/scaletest-runner).
Learn more about [Coders architecture](./architecture.md) and our Learn more about [Coders architecture](./architecture.md) and our
[scale-testing methodology](./scale-testing.md). [scale-testing methodology](./scale-testing.md).
For more information about scaling, see our [Coder scaling best practices](../../tutorials/best-practices/scale-coder.md).
## Recent scale tests ## Recent scale tests
> Note: the below information is for reference purposes only, and are not The information in this doc is for reference purposes only, and is not intended
> intended to be used as guidelines for infrastructure sizing. Review the to be used as guidelines for infrastructure sizing.
> [Reference Architectures](./validated-architectures/index.md#node-sizing) for
> hardware sizing recommendations. Review the [Reference Architectures](./validated-architectures/index.md#node-sizing) for
hardware sizing recommendations.
| Environment | Coder CPU | Coder RAM | Coder Replicas | Database | Users | Concurrent builds | Concurrent connections (Terminal/SSH) | Coder Version | Last tested | | Environment | Coder CPU | Coder RAM | Coder Replicas | Database | Users | Concurrent builds | Concurrent connections (Terminal/SSH) | Coder Version | Last tested |
|------------------|-----------|-----------|----------------|-------------------|-------|-------------------|---------------------------------------|---------------|--------------| |------------------|-----------|-----------|----------------|-------------------|-------|-------------------|---------------------------------------|---------------|--------------|
@ -25,8 +28,7 @@ Learn more about [Coders architecture](./architecture.md) and our
| Kubernetes (GKE) | 4 cores | 16 GB | 2 | db-custom-8-30720 | 2000 | 50 | 2000 simulated | `v2.8.4` | Feb 28, 2024 | | Kubernetes (GKE) | 4 cores | 16 GB | 2 | db-custom-8-30720 | 2000 | 50 | 2000 simulated | `v2.8.4` | Feb 28, 2024 |
| Kubernetes (GKE) | 2 cores | 4 GB | 2 | db-custom-2-7680 | 1000 | 50 | 1000 simulated | `v2.10.2` | Apr 26, 2024 | | Kubernetes (GKE) | 2 cores | 4 GB | 2 | db-custom-2-7680 | 1000 | 50 | 1000 simulated | `v2.10.2` | Apr 26, 2024 |
> Note: a simulated connection reads and writes random data at 40KB/s per > Note: A simulated connection reads and writes random data at 40KB/s per connection.
> connection.
## Scale testing utility ## Scale testing utility
@ -34,17 +36,24 @@ Since Coder's performance is highly dependent on the templates and workflows you
support, you may wish to use our internal scale testing utility against your own support, you may wish to use our internal scale testing utility against your own
environments. environments.
> Note: This utility is experimental. It is not subject to any compatibility <blockquote class="admonition important">
> guarantees, and may cause interruptions for your users. To avoid potential
> outages and orphaned resources, we recommend running scale tests on a This utility is experimental.
> secondary "staging" environment or a dedicated
> [Kubernetes playground cluster](https://github.com/coder/coder/tree/main/scaletest/terraform). It is not subject to any compatibility guarantees and may cause interruptions
> Run it against a production environment at your own risk. for your users.
To avoid potential outages and orphaned resources, we recommend that you run
scale tests on a secondary "staging" environment or a dedicated
[Kubernetes playground cluster](https://github.com/coder/coder/tree/main/scaletest/terraform).
Run it against a production environment at your own risk.
</blockquote>
### Create workspaces ### Create workspaces
The following command will provision a number of Coder workspaces using the The following command will provision a number of Coder workspaces using the
specified template and extra parameters. specified template and extra parameters:
```shell ```shell
coder exp scaletest create-workspaces \ coder exp scaletest create-workspaces \
@ -56,8 +65,6 @@ coder exp scaletest create-workspaces \
--job-timeout 5h \ --job-timeout 5h \
--no-cleanup \ --no-cleanup \
--output json:"${SCALETEST_RESULTS_DIR}/create-workspaces.json" --output json:"${SCALETEST_RESULTS_DIR}/create-workspaces.json"
# Run `coder exp scaletest create-workspaces --help` for all usage
``` ```
The command does the following: The command does the following:
@ -70,6 +77,12 @@ The command does the following:
1. If you don't want the creation process to be interrupted by any errors, use 1. If you don't want the creation process to be interrupted by any errors, use
the `--retry 5` flag. the `--retry 5` flag.
For more built-in `scaletest` options, use the `--help` flag:
```shell
coder exp scaletest create-workspaces --help
```
### Traffic Generation ### Traffic Generation
Given an existing set of workspaces created previously with `create-workspaces`, Given an existing set of workspaces created previously with `create-workspaces`,
@ -105,7 +118,11 @@ The `workspace-traffic` supports also other modes - SSH traffic, workspace app:
1. For SSH traffic: Use `--ssh` flag to generate SSH traffic instead of Web 1. For SSH traffic: Use `--ssh` flag to generate SSH traffic instead of Web
Terminal. Terminal.
1. For workspace app traffic: Use `--app [wsdi|wsec|wsra]` flag to select app 1. For workspace app traffic: Use `--app [wsdi|wsec|wsra]` flag to select app
behavior. (modes: _WebSocket discard_, _WebSocket echo_, _WebSocket read_). behavior.
- `wsdi`: WebSocket discard
- `wsec`: WebSocket echo
- `wsra`: WebSocket read
### Cleanup ### Cleanup

View File

@ -243,6 +243,11 @@
"title": "Scaling Utilities", "title": "Scaling Utilities",
"description": "Tools to help you scale your deployment", "description": "Tools to help you scale your deployment",
"path": "./admin/infrastructure/scale-utility.md" "path": "./admin/infrastructure/scale-utility.md"
},
{
"title": "Scaling best practices",
"description": "How to prepare a Coder deployment for scale",
"path": "./tutorials/best-practices/scale-coder.md"
} }
] ]
}, },
@ -761,16 +766,21 @@
"description": "Guides to help you make the most of your Coder experience", "description": "Guides to help you make the most of your Coder experience",
"path": "./tutorials/best-practices/index.md", "path": "./tutorials/best-practices/index.md",
"children": [ "children": [
{
"title": "Security - best practices",
"description": "Make your Coder deployment more secure",
"path": "./tutorials/best-practices/security-best-practices.md"
},
{ {
"title": "Organizations - best practices", "title": "Organizations - best practices",
"description": "How to make the best use of Coder Organizations", "description": "How to make the best use of Coder Organizations",
"path": "./tutorials/best-practices/organizations.md" "path": "./tutorials/best-practices/organizations.md"
}, },
{
"title": "Scale Coder",
"description": "How to prepare a Coder deployment for scale",
"path": "./tutorials/best-practices/scale-coder.md"
},
{
"title": "Security - best practices",
"description": "Make your Coder deployment more secure",
"path": "./tutorials/best-practices/security-best-practices.md"
},
{ {
"title": "Speed up your workspaces", "title": "Speed up your workspaces",
"description": "Speed up your Coder templates and workspaces", "description": "Speed up your Coder templates and workspaces",

View File

@ -0,0 +1,322 @@
# Scale Coder
This best practice guide helps you prepare a Coder deployment that you can
scale up to a high-scale deployment as use grows, and keep it operating smoothly with a
high number of active users and workspaces.
## Observability
Observability is one of the most important aspects to a scalable Coder deployment.
When you have visibility into performance and usage metrics, you can make informed
decisions about what changes you should make.
[Monitor your Coder deployment](../../admin/monitoring/index.md) with log output
and metrics to identify potential bottlenecks before they negatively affect the
end-user experience and measure the effects of modifications you make to your
deployment.
- Log output
- Capture log output from from Coder Server instances and external provisioner daemons
and store them in a searchable log store like Loki, CloudWatch logs, or other tools.
- Retain logs for a minimum of thirty days, ideally ninety days.
This allows you investigate when anomalous behaviors began.
- Metrics
- Capture infrastructure metrics like CPU, memory, open files, and network I/O for all
Coder Server, external provisioner daemon, workspace proxy, and PostgreSQL instances.
- Capture Coder Server and External Provisioner daemons metrics
[via Prometheus](#how-to-capture-coder-server-metrics-with-prometheus).
Retain metric time series for at least six months. This allows you to see
performance trends relative to user growth.
For a more comprehensive overview, integrate metrics with an observability
dashboard like [Grafana](../../admin/monitoring/index.md).
### Observability key metrics
Configure alerting based on these metrics to ensure you surface problems before
they affect the end-user experience.
- CPU and Memory Utilization
- Monitor the utilization as a fraction of the available resources on the instance.
Utilization will vary with use throughout the course of a day, week, and longer timelines.
Monitor trends and pay special attention to the daily and weekly peak utilization.
Use long-term trends to plan infrastructure upgrades.
- Tail latency of Coder Server API requests
- High tail latency can indicate Coder Server or the PostgreSQL database is underprovisioned
for the load.
- Use the `coderd_api_request_latencies_seconds` metric.
- Tail latency of database queries
- High tail latency can indicate the PostgreSQL database is low in resources.
- Use the `coderd_db_query_latencies_seconds` metric.
### How to capture Coder server metrics with Prometheus
Edit your Helm `values.yaml` to capture metrics from Coder Server and external provisioner daemons with
[Prometheus](../../admin/integrations/prometheus.md):
1. Enable Prometheus metrics:
```yaml
CODER_PROMETHEUS_ENABLE=true
```
1. Enable database metrics:
```yaml
CODER_PROMETHEUS_COLLECT_DB_METRICS=true
```
1. For a high scale deployment, configure agent stats to avoid large cardinality or disable them:
- Configure agent stats:
```yaml
CODER_PROMETHEUS_AGGREGATE_AGENT_STATS_BY=agent_name
```
- Disable agent stats:
```yaml
CODER_PROMETHEUS_COLLECT_AGENT_STATS=false
```
## Coder Server
### Locality
If increased availability of the Coder API is a concern, deploy at least three
instances of Coder Server. Spread the instances across nodes with anti-affinity rules in
Kubernetes or in different availability zones of the same geographic region.
Do not deploy in different geographic regions.
Coder Servers need to be able to communicate with one another directly with low
latency, under 10ms. Note that this is for the availability of the Coder API.
Workspaces are not fault tolerant unless they are explicitly built that way at
the template level.
Deploy Coder Server instances as geographically close to PostgreSQL as possible.
Low-latency communication (under 10ms) with Postgres is essential for Coder
Server's performance.
### Scaling
Coder Server can be scaled both vertically for bigger instances and horizontally
for more instances.
Aim to keep the number of Coder Server instances relatively small, preferably
under ten instances, and opt for vertical scale over horizontal scale after
meeting availability requirements.
Coder's
[validated architectures](../../admin/infrastructure/validated-architectures/index.md)
give specific sizing recommendations for various user scales. These are a useful
starting point, but very few deployments will remain stable at a predetermined
user level over the long term. We recommend monitoring and adjusting resources as needed.
We don't recommend that you autoscale the Coder Servers. Instead, scale the
deployment for peak weekly usage.
Although Coder Server persists no internal state, it operates as a proxy for end
users to their workspaces in two capacities:
1. As an HTTP proxy when they access workspace applications in their browser via
the Coder Dashboard.
1. As a DERP proxy when establishing tunneled connections with CLI tools like
`coder ssh`, `coder port-forward`, and others, and with desktop IDEs.
Stopping a Coder Server instance will (momentarily) disconnect any users
currently connecting through that instance. Adding a new instance is not
disruptive, but you should remove instances and perform upgrades during a
maintenance window to minimize disruption.
## Provisioner daemons
### Locality
We recommend that you run one or more
[provisioner daemon deployments external to Coder Server](../../admin/provisioners.md)
and disable provisioner daemons within your Coder Server.
This allows you to scale them independently of the Coder Server:
```yaml
CODER_PROVISIONER_DAEMONS=0
```
We recommend deploying provisioner daemons within the same cluster as the
workspaces they will provision or are hosted in.
- This gives them a low-latency connection to the APIs they will use to
provision workspaces and can speed builds.
- It allows provisioner daemons to use in-cluster mechanisms (for example
Kubernetes service account tokens, AWS IAM Roles, and others) to authenticate with
the infrastructure APIs.
- If you deploy workspaces in multiple clusters, run multiple provisioner daemon
deployments and use template tags to select the correct set of provisioner
daemons.
- Provisioner daemons need to be able to connect to Coder Server, but this does not need
to be a low-latency connection.
Provisioner daemons make no direct connections to the PostgreSQL database, so
there's no need for locality to the Postgres database.
### Scaling
Each provisioner daemon instance can handle a single workspace build job at a
time. Therefore, the maximum number of simultaneous builds your Coder deployment
can handle is equal to the number of provisioner daemon instances within a tagged
deployment.
If users experience unacceptably long queues for workspace builds to start,
consider increasing the number of provisioner daemon instances in the affected
cluster.
You might need to automatically scale the number of provisioner daemon instances
throughout the day to meet demand.
If you stop instances with `SIGHUP`, they will complete their current build job
and exit. `SIGINT` will cancel the current job, which will result in a failed build.
Ensure your autoscaler waits long enough for your build jobs to complete before
it kills the provisioner daemon process.
If you deploy in Kubernetes, we recommend a single provisioner daemon per pod.
On a virtual machine (VM), you can deploy multiple provisioner daemons, ensuring
each has a unique `CODER_CACHE_DIRECTORY` value.
Coder's
[validated architectures](../../admin/infrastructure/validated-architectures/index.md)
give specific sizing recommendations for various user scales. Since the
complexity of builds varies significantly depending on the workspace template,
consider this a starting point. Monitor queue times and build times and adjust
the number and size of your provisioner daemon instances.
## PostgreSQL
PostgreSQL is the primary persistence layer for all of Coder's deployment data.
We also use `LISTEN` and `NOTIFY` to coordinate between different instances of
Coder Server.
### Locality
Coder Server instances must have low-latency connections (under 10ms) to
PostgreSQL. If you use multiple PostgreSQL replicas in a clustered config, these
must also be low-latency with respect to one another.
### Scaling
Prefer scaling PostgreSQL vertically rather than horizontally for best
performance. Coder's
[validated architectures](../../admin/infrastructure/validated-architectures/index.md)
give specific sizing recommendations for various user scales.
## Workspace proxies
Workspace proxies proxy HTTP traffic from end users to workspaces for Coder apps
defined in the templates, and HTTP ports opened by the workspace. By default
they also include a DERP Proxy.
### Locality
We recommend each geographic cluster of workspaces have an associated deployment
of workspace proxies. This ensures that users always have a near-optimal proxy
path.
### Scaling
Workspace proxy load is determined by the amount of traffic they proxy.
Monitor CPU, memory, and network I/O utilization to decide when to resize
the number of proxy instances.
Scale for peak demand and scale down or upgrade during a maintenance window.
We do not recommend autoscaling the workspace proxies because many applications
use long-lived connections such as websockets, which would be disrupted by
stopping the proxy.
## Workspaces
Workspaces represent the vast majority of resources in most Coder deployments.
Because they are defined by templates, there is no one-size-fits-all advice for
scaling workspaces.
### Hard and soft cluster limits
All Infrastructure as a Service (IaaS) clusters have limits to what can be
simultaneously provisioned. These could be hard limits, based on the physical
size of the cluster, especially in the case of a private cloud, or soft limits,
based on configured limits in your public cloud account.
It is important to be aware of these limits and monitor Coder workspace resource
utilization against the limits, so that a new influx of users don't encounter
failed builds. Monitoring these is outside the scope of Coder, but we recommend
that you set up dashboards and alerts for each kind of limited resource.
As you approach soft limits, you can request limit increases to keep growing.
As you approach hard limits, consider deploying to additional cluster(s).
### Workspaces per node
Many development workloads are "spiky" in their CPU and memory requirements, for
example, they peak during build/test and then lower while editing code.
This leads to an opportunity to efficiently use compute resources by packing multiple
workspaces onto a single node. This can lead to better experience (more CPU and
memory available during brief bursts) and lower cost.
There are a number of things you should consider before you decide how many
workspaces you should allow per node:
- "Noisy neighbor" issues: Users share the node's CPU and memory resources and might
be susceptible to a user or process consuming shared resources.
- If the shared nodes are a provisioned resource, for example, Kubernetes nodes
running on VMs in a public cloud, then it can sometimes be a challenge to
effectively autoscale down.
- For example, if half the workspaces are stopped overnight, and there are ten
workspaces per node, it's unlikely that all ten workspaces on the node are
among the stopped ones.
- You can mitigate this by lowering the number of workspaces per node, or
using autostop policies to stop more workspaces during off-peak hours.
- If you do overprovision workspaces onto nodes, keep them in a separate node
pool and schedule Coder control plane (Coder Server, PostgreSQL, workspace
proxies) components on a different node pool to avoid resource spikes
affecting them.
Coder customers have had success with both:
- One workspace per AWS VM
- Lots of workspaces on Kubernetes nodes for efficiency
### Cost control
- Use quotas to discourage users from creating many workspaces they don't need
simultaneously.
- Label workspace cloud resources by user, team, organization, or your own
labelling conventions to track usage at different granularities.
- Use autostop requirements to bring off-peak utilization down.
## Networking
Set up your network so that most users can get direct, peer-to-peer connections
to their workspaces. This drastically reduces the load on Coder Server and
workspace proxy instances.
## Next steps
- [Scale Tests and Utilities](../../admin/infrastructure/scale-utility.md)
- [Scale Testing](../../admin/infrastructure/scale-testing.md)