docs: add new scaling doc to best practices section (#15904)

[preview](https://coder.com/docs/@bp-scaling-coder/tutorials/best-practices/scale-coder)

---------

Co-authored-by: Spike Curtis <spike@coder.com>
This commit is contained in:
Edward Angert
2025-01-21 15:02:02 -05:00
committed by GitHub
parent 0fa6b3df13
commit 02d0650ae8
4 changed files with 393 additions and 39 deletions

View File

@ -5,7 +5,7 @@ without compromising service. This process encompasses infrastructure setup,
traffic projections, and aggressive testing to identify and mitigate potential
bottlenecks.
A dedicated Kubernetes cluster for Coder is recommended to configure, host and
A dedicated Kubernetes cluster for Coder is recommended to configure, host, and
manage Coder workloads. Kubernetes provides container orchestration
capabilities, allowing Coder to efficiently deploy, scale, and manage workspaces
across a distributed infrastructure. This ensures high availability, fault
@ -13,27 +13,29 @@ tolerance, and scalability for Coder deployments. Coder is deployed on this
cluster using the
[Helm chart](../../install/kubernetes.md#4-install-coder-with-helm).
For more information about scaling, see our [Coder scaling best practices](../../tutorials/best-practices/scale-coder.md).
## Methodology
Our scale tests include the following stages:
1. Prepare environment: create expected users and provision workspaces.
2. SSH connections: establish user connections with agents, verifying their
1. SSH connections: establish user connections with agents, verifying their
ability to echo back received content.
3. Web Terminal: verify the PTY connection used for communication with Web
1. Web Terminal: verify the PTY connection used for communication with Web
Terminal.
4. Workspace application traffic: assess the handling of user connections with
1. Workspace application traffic: assess the handling of user connections with
specific workspace apps, confirming their capability to echo back received
content effectively.
5. Dashboard evaluation: verify the responsiveness and stability of Coder
1. Dashboard evaluation: verify the responsiveness and stability of Coder
dashboards under varying load conditions. This is achieved by simulating user
interactions using instances of headless Chromium browsers.
6. Cleanup: delete workspaces and users created in step 1.
1. Cleanup: delete workspaces and users created in step 1.
## Infrastructure and setup requirements
@ -54,13 +56,16 @@ channel for IDEs with VS Code and JetBrains plugins.
The basic setup of scale tests environment involves:
1. Scale tests runner (32 vCPU, 128 GB RAM)
2. Coder: 2 replicas (4 vCPU, 16 GB RAM)
3. Database: 1 instance (2 vCPU, 32 GB RAM)
4. Provisioner: 50 instances (0.5 vCPU, 512 MB RAM)
1. Coder: 2 replicas (4 vCPU, 16 GB RAM)
1. Database: 1 instance (2 vCPU, 32 GB RAM)
1. Provisioner: 50 instances (0.5 vCPU, 512 MB RAM)
The test is deemed successful if users did not experience interruptions in their
workflows, `coderd` did not crash or require restarts, and no other internal
errors were observed.
The test is deemed successful if:
- Users did not experience interruptions in their
workflows,
- `coderd` did not crash or require restarts, and
- No other internal errors were observed.
## Traffic Projections
@ -90,11 +95,11 @@ Database:
## Available reference architectures
[Up to 1,000 users](./validated-architectures/1k-users.md)
- [Up to 1,000 users](./validated-architectures/1k-users.md)
[Up to 2,000 users](./validated-architectures/2k-users.md)
- [Up to 2,000 users](./validated-architectures/2k-users.md)
[Up to 3,000 users](./validated-architectures/3k-users.md)
- [Up to 3,000 users](./validated-architectures/3k-users.md)
## Hardware recommendation
@ -107,7 +112,7 @@ guidance on optimal configurations. A reasonable approach involves using scaling
formulas based on factors like CPU, memory, and the number of users.
While the minimum requirements specify 1 CPU core and 2 GB of memory per
`coderd` replica, it is recommended to allocate additional resources depending
`coderd` replica, we recommend that you allocate additional resources depending
on the workload size to ensure deployment stability.
#### CPU and memory usage

View File

@ -1,20 +1,23 @@
# Scale Tests and Utilities
We scale-test Coder with [a built-in utility](#scale-testing-utility) that can
We scale-test Coder with a built-in utility that can
be used in your environment for insights into how Coder scales with your
infrastructure. For scale-testing Kubernetes clusters we recommend to install
infrastructure. For scale-testing Kubernetes clusters we recommend that you install
and use the dedicated Coder template,
[scaletest-runner](https://github.com/coder/coder/tree/main/scaletest/templates/scaletest-runner).
Learn more about [Coders architecture](./architecture.md) and our
[scale-testing methodology](./scale-testing.md).
For more information about scaling, see our [Coder scaling best practices](../../tutorials/best-practices/scale-coder.md).
## Recent scale tests
> Note: the below information is for reference purposes only, and are not
> intended to be used as guidelines for infrastructure sizing. Review the
> [Reference Architectures](./validated-architectures/index.md#node-sizing) for
> hardware sizing recommendations.
The information in this doc is for reference purposes only, and is not intended
to be used as guidelines for infrastructure sizing.
Review the [Reference Architectures](./validated-architectures/index.md#node-sizing) for
hardware sizing recommendations.
| Environment | Coder CPU | Coder RAM | Coder Replicas | Database | Users | Concurrent builds | Concurrent connections (Terminal/SSH) | Coder Version | Last tested |
|------------------|-----------|-----------|----------------|-------------------|-------|-------------------|---------------------------------------|---------------|--------------|
@ -25,8 +28,7 @@ Learn more about [Coders architecture](./architecture.md) and our
| Kubernetes (GKE) | 4 cores | 16 GB | 2 | db-custom-8-30720 | 2000 | 50 | 2000 simulated | `v2.8.4` | Feb 28, 2024 |
| Kubernetes (GKE) | 2 cores | 4 GB | 2 | db-custom-2-7680 | 1000 | 50 | 1000 simulated | `v2.10.2` | Apr 26, 2024 |
> Note: a simulated connection reads and writes random data at 40KB/s per
> connection.
> Note: A simulated connection reads and writes random data at 40KB/s per connection.
## Scale testing utility
@ -34,17 +36,24 @@ Since Coder's performance is highly dependent on the templates and workflows you
support, you may wish to use our internal scale testing utility against your own
environments.
> Note: This utility is experimental. It is not subject to any compatibility
> guarantees, and may cause interruptions for your users. To avoid potential
> outages and orphaned resources, we recommend running scale tests on a
> secondary "staging" environment or a dedicated
> [Kubernetes playground cluster](https://github.com/coder/coder/tree/main/scaletest/terraform).
> Run it against a production environment at your own risk.
<blockquote class="admonition important">
This utility is experimental.
It is not subject to any compatibility guarantees and may cause interruptions
for your users.
To avoid potential outages and orphaned resources, we recommend that you run
scale tests on a secondary "staging" environment or a dedicated
[Kubernetes playground cluster](https://github.com/coder/coder/tree/main/scaletest/terraform).
Run it against a production environment at your own risk.
</blockquote>
### Create workspaces
The following command will provision a number of Coder workspaces using the
specified template and extra parameters.
specified template and extra parameters:
```shell
coder exp scaletest create-workspaces \
@ -56,8 +65,6 @@ coder exp scaletest create-workspaces \
--job-timeout 5h \
--no-cleanup \
--output json:"${SCALETEST_RESULTS_DIR}/create-workspaces.json"
# Run `coder exp scaletest create-workspaces --help` for all usage
```
The command does the following:
@ -70,6 +77,12 @@ The command does the following:
1. If you don't want the creation process to be interrupted by any errors, use
the `--retry 5` flag.
For more built-in `scaletest` options, use the `--help` flag:
```shell
coder exp scaletest create-workspaces --help
```
### Traffic Generation
Given an existing set of workspaces created previously with `create-workspaces`,
@ -105,7 +118,11 @@ The `workspace-traffic` supports also other modes - SSH traffic, workspace app:
1. For SSH traffic: Use `--ssh` flag to generate SSH traffic instead of Web
Terminal.
1. For workspace app traffic: Use `--app [wsdi|wsec|wsra]` flag to select app
behavior. (modes: _WebSocket discard_, _WebSocket echo_, _WebSocket read_).
behavior.
- `wsdi`: WebSocket discard
- `wsec`: WebSocket echo
- `wsra`: WebSocket read
### Cleanup