mirror of
https://github.com/coder/coder.git
synced 2025-07-06 15:41:45 +00:00
364 lines
15 KiB
Markdown
364 lines
15 KiB
Markdown
# Coder Validated Architecture
|
|
|
|
Many customers operate Coder in complex organizational environments, consisting
|
|
of multiple business units, agencies, and/or subsidiaries. This can lead to
|
|
numerous Coder deployments, due to discrepancies in regulatory compliance, data
|
|
sovereignty, and level of funding across groups. The Coder Validated
|
|
Architecture (CVA) prescribes a Kubernetes-based deployment approach, enabling
|
|
your organization to deploy a stable Coder instance that is easier to maintain
|
|
and troubleshoot.
|
|
|
|
The following sections will detail the components of the Coder Validated
|
|
Architecture, provide guidance on how to configure and deploy these components,
|
|
and offer insights into how to maintain and troubleshoot your Coder environment.
|
|
|
|
- [General concepts](#general-concepts)
|
|
- [Kubernetes Infrastructure](#kubernetes-infrastructure)
|
|
- [PostgreSQL Database](#postgresql-database)
|
|
- [Operational readiness](#operational-readiness)
|
|
|
|
## Who is this document for?
|
|
|
|
This guide targets the following personas. It assumes a basic understanding of
|
|
cloud/on-premise computing, containerization, and the Coder platform.
|
|
|
|
| Role | Description |
|
|
| ------------------------- | ------------------------------------------------------------------------------ |
|
|
| Platform Engineers | Responsible for deploying, operating the Coder deployment and infrastructure |
|
|
| Enterprise Architects | Responsible for architecting Coder deployments to meet enterprise requirements |
|
|
| Managed Service Providers | Entities that deploy and run Coder software as a service for customers |
|
|
|
|
## CVA Guidance
|
|
|
|
| CVA provides: | CVA does not provide: |
|
|
| ---------------------------------------------- | ---------------------------------------------------------------------------------------- |
|
|
| Single and multi-region K8s deployment options | Prescribing OS, or cloud vs. on-premise |
|
|
| Reference architectures for up to 3,000 users | An approval of your architecture; the CVA solely provides recommendations and guidelines |
|
|
| Best practices for building a Coder deployment | Recommendations for every possible deployment scenario |
|
|
|
|
> For higher level design principles and architectural best practices, see
|
|
> Coder's
|
|
> [Well-Architected Framework](https://coder.com/blog/coder-well-architected-framework).
|
|
|
|
## General concepts
|
|
|
|
This section outlines core concepts and terminology essential for understanding
|
|
Coder's architecture and deployment strategies.
|
|
|
|
### Administrator
|
|
|
|
An administrator is a user role within the Coder platform with elevated
|
|
privileges. Admins have access to administrative functions such as user
|
|
management, template definitions, insights, and deployment configuration.
|
|
|
|
### Coder control plane
|
|
|
|
Coder's control plane, also known as _coderd_, is the main service recommended
|
|
for deployment with multiple replicas to ensure high availability. It provides
|
|
an API for managing workspaces and templates, and serves the dashboard UI. In
|
|
addition, each _coderd_ replica hosts 3 Terraform [provisioners](#provisioner)
|
|
by default.
|
|
|
|
### User
|
|
|
|
A [user](../admin/users.md) is an individual who utilizes the Coder platform to
|
|
develop, test, and deploy applications using workspaces. Users can select
|
|
available templates to provision workspaces. They interact with Coder using the
|
|
web interface, the CLI tool, or directly calling API methods.
|
|
|
|
### Workspace
|
|
|
|
A [workspace](../workspaces.md) refers to an isolated development environment
|
|
where users can write, build, and run code. Workspaces are fully configurable
|
|
and can be tailored to specific project requirements, providing developers with
|
|
a consistent and efficient development environment. Workspaces can be
|
|
autostarted and autostopped, enabling efficient resource management.
|
|
|
|
Users can connect to workspaces using SSH or via workspace applications like
|
|
`code-server`, facilitating collaboration and remote access. Additionally,
|
|
workspaces can be parameterized, allowing users to customize settings and
|
|
configurations based on their unique needs. Workspaces are instantiated using
|
|
Coder templates and deployed on resources created by provisioners.
|
|
|
|
### Template
|
|
|
|
A [template](../templates/index.md) in Coder is a predefined configuration for
|
|
creating workspaces. Templates streamline the process of workspace creation by
|
|
providing pre-configured settings, tooling, and dependencies. They are built by
|
|
template administrators on top of Terraform, allowing for efficient management
|
|
of infrastructure resources. Additionally, templates can utilize Coder modules
|
|
to leverage existing features shared with other templates, enhancing flexibility
|
|
and consistency across deployments. Templates describe provisioning rules for
|
|
infrastructure resources offered by Terraform providers.
|
|
|
|
### Workspace Proxy
|
|
|
|
A [workspace proxy](../admin/workspace-proxies.md) serves as a relay connection
|
|
option for developers connecting to their workspace over SSH, a workspace app,
|
|
or through port forwarding. It helps reduce network latency for geo-distributed
|
|
teams by minimizing the distance network traffic needs to travel. Notably,
|
|
workspace proxies do not handle dashboard connections or API calls.
|
|
|
|
### Provisioner
|
|
|
|
Provisioners in Coder execute Terraform during workspace and template builds.
|
|
While the platform includes built-in provisioner daemons by default, there are
|
|
advantages to employing external provisioners. These external daemons provide
|
|
secure build environments and reduce server load, improving performance and
|
|
scalability. Each provisioner can handle a single concurrent workspace build,
|
|
allowing for efficient resource allocation and workload management.
|
|
|
|
### Registry
|
|
|
|
The [Coder Registry](https://registry.coder.com) is a platform where you can
|
|
find starter templates and _Modules_ for various cloud services and platforms.
|
|
|
|
Templates help create self-service development environments using
|
|
Terraform-defined infrastructure, while _Modules_ simplify template creation by
|
|
providing common features like workspace applications, third-party integrations,
|
|
or helper scripts.
|
|
|
|
Please note that the Registry is a hosted service and isn't available for
|
|
offline use.
|
|
|
|
## Kubernetes Infrastructure
|
|
|
|
Kubernetes is the recommended, and supported platform for deploying Coder in the
|
|
enterprise. It is the hosting platform of choice for a large majority of Coder's
|
|
Fortune 500 customers, and it is the platform in which we build and test against
|
|
here at Coder.
|
|
|
|
### General recommendations
|
|
|
|
In general, it is recommended to deploy Coder into its own respective cluster,
|
|
separate from production applications. Keep in mind that Coder runs development
|
|
workloads, so the cluster should be deployed as such, without production-level
|
|
configurations.
|
|
|
|
### Compute
|
|
|
|
Deploy your Kubernetes cluster with two node groups, one for Coder's control
|
|
plane, and another for user workspaces (if you intend on leveraging K8s for
|
|
end-user compute).
|
|
|
|
#### Control plane nodes
|
|
|
|
The Coder control plane node group must be static, to prevent scale down events
|
|
from dropping pods, and thus dropping user connections to the dashboard UI and
|
|
their workspaces.
|
|
|
|
Coder's Helm Chart supports
|
|
[defining nodeSelectors, affinities, and tolerations](https://github.com/coder/coder/blob/e96652ebbcdd7554977594286b32015115c3f5b6/helm/coder/values.yaml#L221-L249)
|
|
to schedule the control plane pods on the appropriate node group.
|
|
|
|
#### Workspace nodes
|
|
|
|
Coder workspaces can be deployed either as Pods or Deployments in Kubernetes.
|
|
See our
|
|
[example Kubernetes workspace template](https://github.com/coder/coder/tree/main/examples/templates/kubernetes).
|
|
Configure the workspace node group to be auto-scaling, to dynamically allocate
|
|
compute as users start/stop workspaces at the beginning and end of their day.
|
|
Set nodeSelectors, affinities, and tolerations in Coder templates to assign
|
|
workspaces to the given node group:
|
|
|
|
```hcl
|
|
resource "kubernetes_deployment" "coder" {
|
|
spec {
|
|
template {
|
|
metadata {
|
|
labels = {
|
|
app = "coder-workspace"
|
|
}
|
|
}
|
|
|
|
spec {
|
|
affinity {
|
|
pod_anti_affinity {
|
|
preferred_during_scheduling_ignored_during_execution {
|
|
weight = 1
|
|
pod_affinity_term {
|
|
label_selector {
|
|
match_expressions {
|
|
key = "app.kubernetes.io/instance"
|
|
operator = "In"
|
|
values = ["coder-workspace"]
|
|
}
|
|
}
|
|
topology_key = # add your node group label here
|
|
}
|
|
}
|
|
}
|
|
}
|
|
|
|
tolerations {
|
|
# Add your tolerations here
|
|
}
|
|
|
|
node_selector {
|
|
# Add your node selectors here
|
|
}
|
|
|
|
container {
|
|
image = "coder-workspace:latest"
|
|
name = "dev"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
#### Node sizing
|
|
|
|
For sizing recommendations, see the below reference architectures:
|
|
|
|
- [Up to 1,000 users](./1k-users.md)
|
|
|
|
- [Up to 2,000 users](./2k-users.md)
|
|
|
|
- [Up to 3,000 users](./3k-users.md)
|
|
|
|
### Networking
|
|
|
|
It is likely your enterprise deploys Kubernetes clusters with various networking
|
|
restrictions. With this in mind, Coder requires the following connectivity:
|
|
|
|
- Egress from workspace compute to the Coder control plane pods
|
|
- Egress from control plane pods to Coder's PostgreSQL database
|
|
- Egress from control plane pods to git and package repositories
|
|
- Ingress from user devices to the control plane Load Balancer or Ingress
|
|
controller
|
|
|
|
We recommend configuring your network policies in accordance with the above.
|
|
Note that Coder workspaces do not require any ports to be open.
|
|
|
|
### Storage
|
|
|
|
If running Coder workspaces as Kubernetes Pods or Deployments, you will need to
|
|
assign persistent storage. We recommend leveraging a
|
|
[supported Container Storage Interface (CSI) driver](https://kubernetes-csi.github.io/docs/drivers.html)
|
|
in your cluster, with Dynamic Provisioning and read/write, to provide on-demand
|
|
storage to end-user workspaces.
|
|
|
|
The following Kubernetes volume types have been validated by Coder internally,
|
|
and/or by our customers:
|
|
|
|
- [PersistentVolumeClaim](https://kubernetes.io/docs/concepts/storage/volumes/#persistentvolumeclaim)
|
|
- [NFS](https://kubernetes.io/docs/concepts/storage/volumes/#nfs)
|
|
- [subPath](https://kubernetes.io/docs/concepts/storage/volumes/#using-subpath)
|
|
- [cephfs](https://kubernetes.io/docs/concepts/storage/volumes/#cephfs)
|
|
|
|
Our
|
|
[example Kubernetes workspace template](https://github.com/coder/coder/blob/5b9a65e5c137232351381fc337d9784bc9aeecfc/examples/templates/kubernetes/main.tf#L191-L219)
|
|
provisions a PersistentVolumeClaim block storage device, attached to the
|
|
Deployment.
|
|
|
|
It is not recommended to mount volumes from the host node(s) into workspaces,
|
|
for security and reliability purposes. The below volume types are _not_
|
|
recommended for use with Coder:
|
|
|
|
- [Local](https://kubernetes.io/docs/concepts/storage/volumes/#local)
|
|
- [hostPath](https://kubernetes.io/docs/concepts/storage/volumes/#hostpath)
|
|
|
|
Not that Coder's control plane filesystem is ephemeral, so no persistent storage
|
|
is required.
|
|
|
|
## PostgreSQL database
|
|
|
|
Coder requires access to an external PostgreSQL database to store user data,
|
|
workspace state, template files, and more. Depending on the scale of the
|
|
user-base, workspace activity, and High Availability requirements, the amount of
|
|
CPU and memory resources required by Coder's database may differ.
|
|
|
|
### Disaster recovery
|
|
|
|
Prepare internal scripts for dumping and restoring your database. We recommend
|
|
scheduling regular database backups, especially before upgrading Coder to a new
|
|
release. Coder does not support downgrades without initially restoring the
|
|
database to the prior version.
|
|
|
|
### Performance efficiency
|
|
|
|
We highly recommend deploying the PostgreSQL instance in the same region (and if
|
|
possible, same availability zone) as the Coder server to optimize for low
|
|
latency connections. We recommend keeping latency under 10ms between the Coder
|
|
server and database.
|
|
|
|
When determining scaling requirements, take into account the following
|
|
considerations:
|
|
|
|
- `2 vCPU x 8 GB RAM x 512 GB storage`: A baseline for database requirements for
|
|
Coder deployment with less than 1000 users, and low activity level (30% active
|
|
users). This capacity should be sufficient to support 100 external
|
|
provisioners.
|
|
- Storage size depends on user activity, workspace builds, log verbosity,
|
|
overhead on database encryption, etc.
|
|
- Allocate two additional CPU core to the database instance for every 1000
|
|
active users.
|
|
- Enable High Availability mode for database engine for large scale deployments.
|
|
|
|
If you enable [database encryption](../admin/encryption.md) in Coder, consider
|
|
allocating an additional CPU core to every `coderd` replica.
|
|
|
|
#### Resource utilization guidelines
|
|
|
|
Below are general recommendations for sizing your PostgreSQL instance:
|
|
|
|
- Increase number of vCPU if CPU utilization or database latency is high.
|
|
- Allocate extra memory if database performance is poor, CPU utilization is low,
|
|
and memory utilization is high.
|
|
- Utilize faster disk options (higher IOPS) such as SSDs or NVMe drives for
|
|
optimal performance enhancement and possibly reduce database load.
|
|
|
|
## Operational readiness
|
|
|
|
Operational readiness in Coder is about ensuring that everything is set up
|
|
correctly before launching a platform into production. It involves making sure
|
|
that the service is reliable, secure, and easily scales accordingly to user-base
|
|
needs. Operational readiness is crucial because it helps prevent issues that
|
|
could affect workspace users experience once the platform is live.
|
|
|
|
### Helm Chart Configuration
|
|
|
|
1. Reference our [Helm chart values file](../../helm/coder/values.yaml) and
|
|
identify the required values for deployment.
|
|
1. Create a `values.yaml` and add it to your version control system.
|
|
1. Determine the necessary environment variables. Here is the
|
|
[full list of supported server environment variables](../cli/server.md).
|
|
1. Follow our documented
|
|
[steps for installing Coder via Helm](../install/kubernetes.md).
|
|
|
|
### Template configuration
|
|
|
|
1. Establish dedicated accounts for users with the _Template Administrator_
|
|
role.
|
|
1. Maintain Coder templates using
|
|
[version control](../templates/change-management.md).
|
|
1. Consider implementing a GitOps workflow to automatically push new template
|
|
versions into Coder from git. For example, on Github, you can use the
|
|
[Update Coder Template](https://github.com/marketplace/actions/update-coder-template)
|
|
action.
|
|
1. Evaluate enabling
|
|
[automatic template updates](../templates/general-settings.md#require-automatic-updates-enterprise)
|
|
upon workspace startup.
|
|
|
|
### Observability
|
|
|
|
1. Enable the Prometheus endpoint (environment variable:
|
|
`CODER_PROMETHEUS_ENABLE`).
|
|
1. Deploy the
|
|
[Coder Observability bundle](https://github.com/coder/observability) to
|
|
leverage pre-configured dashboards, alerts, and runbooks for monitoring
|
|
Coder. This includes integrations between Prometheus, Grafana, Loki, and
|
|
Alertmanager.
|
|
1. Review the [Prometheus response](../admin/prometheus.md) and set up alarms on
|
|
selected metrics.
|
|
|
|
### User support
|
|
|
|
1. Incorporate [support links](../admin/appearance.md#support-links) into
|
|
internal documentation accessible from the user context menu. Ensure that
|
|
hyperlinks are valid and lead to up-to-date materials.
|
|
1. Encourage the use of `coder support bundle` to allow workspace users to
|
|
generate and provide network-related diagnostic data.
|