mirror of
https://github.com/coder/coder.git
synced 2025-07-15 22:20:27 +00:00
docs: add new scaling doc to best practices section (#15904)
[preview](https://coder.com/docs/@bp-scaling-coder/tutorials/best-practices/scale-coder) --------- Co-authored-by: Spike Curtis <spike@coder.com>
This commit is contained in:
@ -5,7 +5,7 @@ without compromising service. This process encompasses infrastructure setup,
|
||||
traffic projections, and aggressive testing to identify and mitigate potential
|
||||
bottlenecks.
|
||||
|
||||
A dedicated Kubernetes cluster for Coder is recommended to configure, host and
|
||||
A dedicated Kubernetes cluster for Coder is recommended to configure, host, and
|
||||
manage Coder workloads. Kubernetes provides container orchestration
|
||||
capabilities, allowing Coder to efficiently deploy, scale, and manage workspaces
|
||||
across a distributed infrastructure. This ensures high availability, fault
|
||||
@ -13,27 +13,29 @@ tolerance, and scalability for Coder deployments. Coder is deployed on this
|
||||
cluster using the
|
||||
[Helm chart](../../install/kubernetes.md#4-install-coder-with-helm).
|
||||
|
||||
For more information about scaling, see our [Coder scaling best practices](../../tutorials/best-practices/scale-coder.md).
|
||||
|
||||
## Methodology
|
||||
|
||||
Our scale tests include the following stages:
|
||||
|
||||
1. Prepare environment: create expected users and provision workspaces.
|
||||
|
||||
2. SSH connections: establish user connections with agents, verifying their
|
||||
1. SSH connections: establish user connections with agents, verifying their
|
||||
ability to echo back received content.
|
||||
|
||||
3. Web Terminal: verify the PTY connection used for communication with Web
|
||||
1. Web Terminal: verify the PTY connection used for communication with Web
|
||||
Terminal.
|
||||
|
||||
4. Workspace application traffic: assess the handling of user connections with
|
||||
1. Workspace application traffic: assess the handling of user connections with
|
||||
specific workspace apps, confirming their capability to echo back received
|
||||
content effectively.
|
||||
|
||||
5. Dashboard evaluation: verify the responsiveness and stability of Coder
|
||||
1. Dashboard evaluation: verify the responsiveness and stability of Coder
|
||||
dashboards under varying load conditions. This is achieved by simulating user
|
||||
interactions using instances of headless Chromium browsers.
|
||||
|
||||
6. Cleanup: delete workspaces and users created in step 1.
|
||||
1. Cleanup: delete workspaces and users created in step 1.
|
||||
|
||||
## Infrastructure and setup requirements
|
||||
|
||||
@ -54,13 +56,16 @@ channel for IDEs with VS Code and JetBrains plugins.
|
||||
The basic setup of scale tests environment involves:
|
||||
|
||||
1. Scale tests runner (32 vCPU, 128 GB RAM)
|
||||
2. Coder: 2 replicas (4 vCPU, 16 GB RAM)
|
||||
3. Database: 1 instance (2 vCPU, 32 GB RAM)
|
||||
4. Provisioner: 50 instances (0.5 vCPU, 512 MB RAM)
|
||||
1. Coder: 2 replicas (4 vCPU, 16 GB RAM)
|
||||
1. Database: 1 instance (2 vCPU, 32 GB RAM)
|
||||
1. Provisioner: 50 instances (0.5 vCPU, 512 MB RAM)
|
||||
|
||||
The test is deemed successful if users did not experience interruptions in their
|
||||
workflows, `coderd` did not crash or require restarts, and no other internal
|
||||
errors were observed.
|
||||
The test is deemed successful if:
|
||||
|
||||
- Users did not experience interruptions in their
|
||||
workflows,
|
||||
- `coderd` did not crash or require restarts, and
|
||||
- No other internal errors were observed.
|
||||
|
||||
## Traffic Projections
|
||||
|
||||
@ -90,11 +95,11 @@ Database:
|
||||
|
||||
## Available reference architectures
|
||||
|
||||
[Up to 1,000 users](./validated-architectures/1k-users.md)
|
||||
- [Up to 1,000 users](./validated-architectures/1k-users.md)
|
||||
|
||||
[Up to 2,000 users](./validated-architectures/2k-users.md)
|
||||
- [Up to 2,000 users](./validated-architectures/2k-users.md)
|
||||
|
||||
[Up to 3,000 users](./validated-architectures/3k-users.md)
|
||||
- [Up to 3,000 users](./validated-architectures/3k-users.md)
|
||||
|
||||
## Hardware recommendation
|
||||
|
||||
@ -107,7 +112,7 @@ guidance on optimal configurations. A reasonable approach involves using scaling
|
||||
formulas based on factors like CPU, memory, and the number of users.
|
||||
|
||||
While the minimum requirements specify 1 CPU core and 2 GB of memory per
|
||||
`coderd` replica, it is recommended to allocate additional resources depending
|
||||
`coderd` replica, we recommend that you allocate additional resources depending
|
||||
on the workload size to ensure deployment stability.
|
||||
|
||||
#### CPU and memory usage
|
||||
|
@ -1,20 +1,23 @@
|
||||
# Scale Tests and Utilities
|
||||
|
||||
We scale-test Coder with [a built-in utility](#scale-testing-utility) that can
|
||||
We scale-test Coder with a built-in utility that can
|
||||
be used in your environment for insights into how Coder scales with your
|
||||
infrastructure. For scale-testing Kubernetes clusters we recommend to install
|
||||
infrastructure. For scale-testing Kubernetes clusters we recommend that you install
|
||||
and use the dedicated Coder template,
|
||||
[scaletest-runner](https://github.com/coder/coder/tree/main/scaletest/templates/scaletest-runner).
|
||||
|
||||
Learn more about [Coder’s architecture](./architecture.md) and our
|
||||
[scale-testing methodology](./scale-testing.md).
|
||||
|
||||
For more information about scaling, see our [Coder scaling best practices](../../tutorials/best-practices/scale-coder.md).
|
||||
|
||||
## Recent scale tests
|
||||
|
||||
> Note: the below information is for reference purposes only, and are not
|
||||
> intended to be used as guidelines for infrastructure sizing. Review the
|
||||
> [Reference Architectures](./validated-architectures/index.md#node-sizing) for
|
||||
> hardware sizing recommendations.
|
||||
The information in this doc is for reference purposes only, and is not intended
|
||||
to be used as guidelines for infrastructure sizing.
|
||||
|
||||
Review the [Reference Architectures](./validated-architectures/index.md#node-sizing) for
|
||||
hardware sizing recommendations.
|
||||
|
||||
| Environment | Coder CPU | Coder RAM | Coder Replicas | Database | Users | Concurrent builds | Concurrent connections (Terminal/SSH) | Coder Version | Last tested |
|
||||
|------------------|-----------|-----------|----------------|-------------------|-------|-------------------|---------------------------------------|---------------|--------------|
|
||||
@ -25,8 +28,7 @@ Learn more about [Coder’s architecture](./architecture.md) and our
|
||||
| Kubernetes (GKE) | 4 cores | 16 GB | 2 | db-custom-8-30720 | 2000 | 50 | 2000 simulated | `v2.8.4` | Feb 28, 2024 |
|
||||
| Kubernetes (GKE) | 2 cores | 4 GB | 2 | db-custom-2-7680 | 1000 | 50 | 1000 simulated | `v2.10.2` | Apr 26, 2024 |
|
||||
|
||||
> Note: a simulated connection reads and writes random data at 40KB/s per
|
||||
> connection.
|
||||
> Note: A simulated connection reads and writes random data at 40KB/s per connection.
|
||||
|
||||
## Scale testing utility
|
||||
|
||||
@ -34,17 +36,24 @@ Since Coder's performance is highly dependent on the templates and workflows you
|
||||
support, you may wish to use our internal scale testing utility against your own
|
||||
environments.
|
||||
|
||||
> Note: This utility is experimental. It is not subject to any compatibility
|
||||
> guarantees, and may cause interruptions for your users. To avoid potential
|
||||
> outages and orphaned resources, we recommend running scale tests on a
|
||||
> secondary "staging" environment or a dedicated
|
||||
> [Kubernetes playground cluster](https://github.com/coder/coder/tree/main/scaletest/terraform).
|
||||
> Run it against a production environment at your own risk.
|
||||
<blockquote class="admonition important">
|
||||
|
||||
This utility is experimental.
|
||||
|
||||
It is not subject to any compatibility guarantees and may cause interruptions
|
||||
for your users.
|
||||
To avoid potential outages and orphaned resources, we recommend that you run
|
||||
scale tests on a secondary "staging" environment or a dedicated
|
||||
[Kubernetes playground cluster](https://github.com/coder/coder/tree/main/scaletest/terraform).
|
||||
|
||||
Run it against a production environment at your own risk.
|
||||
|
||||
</blockquote>
|
||||
|
||||
### Create workspaces
|
||||
|
||||
The following command will provision a number of Coder workspaces using the
|
||||
specified template and extra parameters.
|
||||
specified template and extra parameters:
|
||||
|
||||
```shell
|
||||
coder exp scaletest create-workspaces \
|
||||
@ -56,8 +65,6 @@ coder exp scaletest create-workspaces \
|
||||
--job-timeout 5h \
|
||||
--no-cleanup \
|
||||
--output json:"${SCALETEST_RESULTS_DIR}/create-workspaces.json"
|
||||
|
||||
# Run `coder exp scaletest create-workspaces --help` for all usage
|
||||
```
|
||||
|
||||
The command does the following:
|
||||
@ -70,6 +77,12 @@ The command does the following:
|
||||
1. If you don't want the creation process to be interrupted by any errors, use
|
||||
the `--retry 5` flag.
|
||||
|
||||
For more built-in `scaletest` options, use the `--help` flag:
|
||||
|
||||
```shell
|
||||
coder exp scaletest create-workspaces --help
|
||||
```
|
||||
|
||||
### Traffic Generation
|
||||
|
||||
Given an existing set of workspaces created previously with `create-workspaces`,
|
||||
@ -105,7 +118,11 @@ The `workspace-traffic` supports also other modes - SSH traffic, workspace app:
|
||||
1. For SSH traffic: Use `--ssh` flag to generate SSH traffic instead of Web
|
||||
Terminal.
|
||||
1. For workspace app traffic: Use `--app [wsdi|wsec|wsra]` flag to select app
|
||||
behavior. (modes: _WebSocket discard_, _WebSocket echo_, _WebSocket read_).
|
||||
behavior.
|
||||
|
||||
- `wsdi`: WebSocket discard
|
||||
- `wsec`: WebSocket echo
|
||||
- `wsra`: WebSocket read
|
||||
|
||||
### Cleanup
|
||||
|
||||
|
Reference in New Issue
Block a user