coder

mirror of https://github.com/coder/coder.git synced 2025-07-12 00:14:10 +00:00

Author	SHA1	Message	Date
Mathias Fredriksson	3de0003e4b	feat(agent): send devcontainer CLI logs during recreate (#17845 ) We need a way to surface what's happening to the user, since autostart logs here, it's natural we do so during re-create as well. Updates #16424	2025-05-15 16:06:56 +03:00
Ethan	53ba3613b3	feat(cli): use coder connect in `coder ssh --stdio`, if available (#17572 ) Closes https://github.com/coder/vscode-coder/issues/447 Closes https://github.com/coder/jetbrains-coder/issues/543 Closes https://github.com/coder/coder-jetbrains-toolbox/issues/21 This PR adds Coder Connect support to `coder ssh --stdio`. When connecting to a workspace, if `--force-new-tunnel` is not passed, the CLI will first do a DNS lookup for `<agent>.<workspace>.<owner>.<hostname-suffix>`. If an IP address is returned, and it's within the Coder service prefix, the CLI will not create a new tailnet connection to the workspace, and instead dial the SSH server running on port 22 on the workspace directly over TCP. This allows IDE extensions to use the Coder Connect tunnel, without requiring any modifications to the extensions themselves. Additionally, `using_coder_connect` is added to the `sshNetworkStats` file, which the VS Code extension (and maybe Jetbrains?) will be able to read, and indicate to the user that they are using Coder Connect. One advantage of this approach is that running `coder ssh --stdio` on an offline workspace with Coder Connect enabled will have the CLI wait for the workspace to build, the agent to connect (and optionally, for the startup scripts to finish), before finally connecting using the Coder Connect tunnel. As a result, `coder ssh --stdio` has the overhead of looking up the workspace and agent, and checking if they are running. On my device, this meant `coder ssh --stdio <workspace>` was approximately a second slower than just connecting to the workspace directly using `ssh <workspace>.coder` (I would assume anyone serious about their Coder Connect usage would know to just do the latter anyway). To ensure this doesn't come at a significant performance cost, I've also benchmarked this PR. <details> <summary>Benchmark</summary> ## Methodology All tests were completed on `dev.coder.com`, where a Linux workspace running in AWS `us-west1` was created. The machine running Coder Desktop (the 'client') was a Windows VM running in the same AWS region and VPC as the workspace. To test the performance of specifically the SSH connection, a port was forwarded between the client and workspace using: ``` ssh -p 22 -L7001:localhost:7001 <host> ``` where `host` was either an alias for an SSH ProxyCommand that called `coder ssh`, or a Coder Connect hostname. For latency, [`tcping`](https://www.elifulkerson.com/projects/tcping.php) was used against the forwarded port: ``` tcping -n 100 localhost 7001 ``` For throughput, [`iperf3`](https://iperf.fr/iperf-download.php) was used: ``` iperf3 -c localhost -p 7001 ``` where an `iperf3` server was running on the workspace on port 7001. ## Test Cases ### Testcase 1: `coder ssh` `ProxyCommand` that bicopies from Coder Connect This case tests the implementation in this PR, such that we can write a config like: ``` Host codercliconnect ProxyCommand /path/to/coder ssh --stdio workspace ``` With Coder Connect enabled, `ssh -p 22 -L7001:localhost:7001 codercliconnect` will use the Coder Connect tunnel. The results were as follows: Throughput, 10 tests, back to back: - Average throughput across all tests: 788.20 Mbits/sec - Minimum average throughput: 731 Mbits/sec - Maximum average throughput: 871 Mbits/sec - Standard Deviation: 38.88 Mbits/sec Latency, 100 RTTs: - Average: 0.369ms - Minimum: 0.290ms - Maximum: 0.473ms ### Testcase 2: `ssh` dialing Coder Connect directly without a `ProxyCommand` This is what we assume to be the 'best' way to use Coder Connect Throughput, 10 tests, back to back: - Average throughput across all tests: 789.50 Mbits/sec - Minimum average throughput: 708 Mbits/sec - Maximum average throughput: 839 Mbits/sec - Standard Deviation: 39.98 Mbits/sec Latency, 100 RTTs: - Average: 0.369ms - Minimum: 0.267ms - Maximum: 0.440ms ### Testcase 3: `coder ssh` `ProxyCommand` that creates its own Tailnet connection in-process This is what normally happens when you run `coder ssh`: Throughput, 10 tests, back to back: - Average throughput across all tests: 610.20 Mbits/sec - Minimum average throughput: 569 Mbits/sec - Maximum average throughput: 664 Mbits/sec - Standard Deviation: 27.29 Mbits/sec Latency, 100 RTTs: - Average: 0.335ms - Minimum: 0.262ms - Maximum: 0.452ms ## Analysis Performing a two-tailed, unpaired t-test against the throughput of testcases 1 and 2, we find a P value of `0.9450`. This suggests the difference between the data sets is not statistically significant. In other words, there is a 94.5% chance that the difference between the data sets is due to chance. ## Conclusion From the t-test, and by comparison to the status quo (regular `coder ssh`, which uses gvisor, and is noticeably slower), I think it's safe to say any impact on throughput or latency by the `ProxyCommand` performing a bicopy against Coder Connect is negligible. Users are very much unlikely to run into performance issues as a result of using Coder Connect via `coder ssh`, as implemented in this PR. Less scientifically, I ran these same tests on my home network with my Sydney workspace, and both throughput and latency were consistent across testcases 1 and 2. </details>	2025-04-30 15:17:10 +10:00
Spike Curtis	3b54254177	feat: add coder connect exists hidden subcommand (#17418 ) Adds a new hidden subcommand `coder connect exists <hostname>` that checks if the name exists via Coder Connect. This will be used in SSH config to match only if Coder Connect is unavailable for the hostname in question, so that the SSH client will directly dial the workspace over an existing Coder Connect tunnel. Also refactors the way we inject a test DNS resolver into the lookup functions so that we can test from outside the `workspacesdk` package.	2025-04-17 11:23:24 +04:00
ケイラ	f670bc31f5	chore: update testutil chan helpers (#17408 )	2025-04-16 10:37:09 -06:00
Danny Kopping	0b18e458f4	fix: reduce excessive logging when database is unreachable (#17363 ) Fixes #17045 --------- Signed-off-by: Danny Kopping <dannykopping@gmail.com>	2025-04-15 10:55:30 +02:00
Spike Curtis	e5ce3824ca	feat: add IsCoderConnectRunning to workspacesdk (#17361 ) Adds `IsCoderConnectRunning()` to the workspacesdk. This will support the `coder` CLI being able to use CoderConnect when it's running. part of #16828	2025-04-14 09:47:46 +04:00
Spike Curtis	2c573dc023	feat: vpn uses WorkspaceHostnameSuffix for DNS names (#17335 ) Use the hostname suffix to set DNS names as programmed into the DNS service and returned by the vpn `Tunnel`. part of: #16828	2025-04-11 13:24:20 +04:00
Spike Curtis	12dc086628	feat: return hostname suffix on AgentConnectionInfo (#17334 ) Adds the Hostname Suffix to `AgentConnectionInfo` --- the VPN provider will use it to control the suffix for DNS hostnames. part of: #16828	2025-04-11 13:09:51 +04:00
Jon Ayers	17ddee05e5	chore: update golang to 1.24.1 (#17035 ) - Update go.mod to use Go 1.24.1 - Update GitHub Actions setup-go action to use Go 1.24.1 - Fix linting issues with golangci-lint by: - Updating to golangci-lint v1.57.1 (more compatible with Go 1.24.1) 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <claude@anthropic.com>	2025-03-26 01:56:39 -05:00
Cian Johnston	68624092a4	feat(agent/reconnectingpty): allow selecting backend type (#17011 ) agent/reconnectingpty: allow specifying backend type cli: exp rpty: automatically select backend based on command	2025-03-20 13:45:31 +00:00
Thomas Kosiewski	d0e2060692	feat(agent): add second SSH listener on port 22 (#16627 ) Some checks are pending ci / changes (push) Waiting to run ci / lint (push) Blocked by required conditions ci / gen (push) Waiting to run ci / fmt (push) Blocked by required conditions ci / test-go (macos-latest) (push) Blocked by required conditions ci / test-go (ubuntu-latest) (push) Blocked by required conditions ci / test-go (windows-2022) (push) Blocked by required conditions ci / test-cli (macos-latest) (push) Blocked by required conditions ci / test-cli (windows-2022) (push) Blocked by required conditions ci / test-go-pg (ubuntu-latest) (push) Blocked by required conditions ci / test-go-pg-16 (push) Blocked by required conditions ci / test-go-race (push) Blocked by required conditions ci / test-go-race-pg (push) Blocked by required conditions ci / test-go-tailnet-integration (push) Blocked by required conditions ci / test-js (push) Blocked by required conditions ci / test-e2e (push) Blocked by required conditions ci / test-e2e-premium (push) Blocked by required conditions ci / chromatic (push) Blocked by required conditions ci / offlinedocs (push) Blocked by required conditions ci / required (push) Blocked by required conditions ci / build-dylib (push) Blocked by required conditions ci / build (push) Blocked by required conditions ci / deploy (push) Blocked by required conditions ci / deploy-wsproxies (push) Blocked by required conditions ci / sqlc-vet (push) Blocked by required conditions ci / notify-slack-on-failure (push) Blocked by required conditions OpenSSF Scorecard / Scorecard analysis (push) Waiting to run Fixes: https://github.com/coder/internal/issues/377 Added an additional SSH listener on port 22, so the agent now listens on both, port one and port 22. --- Change-Id: Ifd986b260f8ac317e37d65111cd4e0bd1dc38af8 Signed-off-by: Thomas Kosiewski <tk@coder.com>	2025-03-03 04:47:42 +01:00
Cian Johnston	172e52317c	feat(agent): wire up agentssh server to allow exec into container (#16638 ) Builds on top of https://github.com/coder/coder/pull/16623/ and wires up the ReconnectingPTY server. This does nothing to wire up the web terminal yet but the added test demonstrates the functionality working. Other changes: * Refactors and moves the `SystemEnvInfo` interface to the `agent/usershell` package to address follow-up from https://github.com/coder/coder/pull/16623#discussion_r1967580249 * Marks `usershellinfo.Get` as deprecated. Consumers should use the `EnvInfoer` interface instead. --------- Co-authored-by: Mathias Fredriksson <mafredri@gmail.com> Co-authored-by: Danny Kopping <danny@coder.com>	2025-02-26 09:03:27 +00:00
Cian Johnston	31b1ff7d3b	feat(agent): add container list handler (#16346 ) Fixes https://github.com/coder/coder/issues/16268 - Adds `/api/v2/workspaceagents/:id/containers` coderd endpoint that allows listing containers visible to the agent. Optional filtering by labels is supported. - Adds go tools to the `coder-dylib` CI step so we can generate mocks if needed	2025-02-10 11:29:30 +00:00
Spike Curtis	2c7f8ac65f	chore: migrate to coder/websocket 1.8.12 (#15898 ) Migrates us to `coder/websocket` v1.8.12 rather than `nhooyr/websocket` on an older version. Works around https://github.com/coder/websocket/issues/504 by adding an explicit test for `xerrors.Is(err, io.EOF)` where we were previously getting `io.EOF` from the netConn.	2024-12-19 00:51:30 +04:00
Spike Curtis	747f7ce173	feat: add support for WorkspaceUpdates to WebsocketDialer (#15534 ) closes #14730 Adds support for WorkspaceUpdates to the WebsocketDialer. This allows us to dial the new endpoint added in #14847 and connect it up to a `tailnet.Controllers` to connect to all agents over the tailnet. I refactored the fakeWorkspaceUpdatesProvider to a mock and moved it to `tailnettest` so it could be more easily reused. The Mock is a little more full-featured.	2024-11-18 10:54:11 +04:00
Spike Curtis	40802958e9	fix: use explicit api versions for agent and tailnet (#15508 ) Bumps the Tailnet and Agent API version 2.3, and creates some extra controls and machinery around these versions. What happened is that we accidentally shipped two new API features without bumping the version. `ScriptCompleted` on the Agent API in Coder v2.16 and `RefreshResumeToken` on the Tailnet API in Coder v2.15. Since we can't easily retroactively bump the versions, we'll roll these changes into API version 2.3 along with the new WorkspaceUpdates RPC, which hasn't been released yet. That means there is some ambiguity in Coder v2.15-v2.17 about exactly what methods are supported on the Tailnet and Agent APIs. This isn't great, but hasn't caused us major issues because 1. RefreshResumeToken is considered optional, and clients just log and move on if the RPC isn't supported. 2. Agents basically never get started talking to a Coderd that is older than they are, since the agent binary is normally downloaded from Coderd at workspace start. Still it's good to get things squared away in terms of versions for SDK users and possible edge cases around client and server versions. To mitigate against this thing happening again, this PR also: 1. adds a CODEOWNERS for the API proto packages, so I'll review changes 2. defines interface types for different API versions, and has the agent explicitly use a specific version. That way, if you add a new method, and try to use it in the agent without thinking explicitly about versions, it won't compile. With the protocol controllers stuff, we've sort of already abstracted the Tailnet API such that the interface type strategy won't work, but I'll work on getting the Controller to be version aware, such that it can check the API version it's getting against the controllers it has -- in a later PR.	2024-11-15 11:16:28 +04:00
Spike Curtis	e5661c2748	feat: add support for multiple tunnel destinations in tailnet (#15409 ) Closes #14729 Expands the Coordination controller used by the CLI client to allow multiple tunnel destinations (agents). Our current client uses just one, but this unifies the logic so that when we add Coder VPN, 1 is just a special case of "many."	2024-11-08 13:32:07 +04:00
Spike Curtis	718722af1b	chore: refactor tailnetAPIConnector to tailnet.Controller (#15361 ) Refactors `workspacesdk.tailnetAPIConnector` as a `tailnet.Controller` to reuse all the reconnection and graceful disconnect logic. chore re: #14729	2024-11-08 10:10:54 +04:00
Spike Curtis	2d061e698d	chore: refactor tailnetAPIConnector to use dialer (#15347 ) refactors `tailnetAPIConnector` to use the `Dialer` interface in `tailnet`, introduced lower in this stack of PRs. This will let us use the same Tailnet API handling code across different things that connect to the Tailnet API (CLI client, coderd, workspace proxies, and soon: Coder VPN). chore re: #14729	2024-11-07 17:24:19 +04:00
Spike Curtis	335e4ab6bf	chore: refactor sending telemetry (#15345 ) Implements a tailnet API Telemetry controller by refactoring from `workspacesdk`. chore re: #14729	2024-11-06 20:23:23 +04:00
Spike Curtis	9126cd78a6	chore: refactor DERP setting loop (#15344 ) Implements a Tailnet API DERP controller by refactoring from `workspacesdk` chore re: #14729	2024-11-06 20:04:05 +04:00
Spike Curtis	886dcbec84	chore: refactor coordination (#15343 ) Refactors the way clients of the Tailnet API (clients of the API, which include both workspace "agents" and "clients") interact with the API. Introduces the idea of abstract "controllers" for each of the RPCs in the API, and implements a Coordination controller by refactoring from `workspacesdk`. chore re: #14729	2024-11-05 13:50:10 +04:00
Ethan	b1298a3c1e	feat: add WorkspaceUpdates tailnet RPC (#14847 ) Closes #14716 Closes #14717 Adds a new user-scoped tailnet API endpoint (`api/v2/tailnet`) with a new RPC stream for receiving updates on workspaces owned by a specific user, as defined in #14716. When a stream is started, the `WorkspaceUpdatesProvider` will begin listening on the user-scoped pubsub events implemented in #14964. When a relevant event type is seen (such as a workspace state transition), the provider will query the DB for all the workspaces (and agents) owned by the user. This gets compared against the result of the previous query to produce a set of workspace updates. Workspace updates can be requested for any user ID, however only workspaces the authorised user is permitted to `ActionRead` will have their updates streamed. Opening a tunnel to an agent requires that the user can perform `ActionSSH` against the workspace containing it.	2024-11-01 14:53:53 +11:00
Jon Ayers	cd890aa3a0	feat: enable key rotation (#15066 ) This PR contains the remaining logic necessary to hook up key rotation to the product.	2024-10-25 17:14:35 +01:00
Spike Curtis	7d9f5ab81d	chore: add Coder service prefix to tailnet (#14943 ) re: #14715 This PR introduces the Coder service prefix: `fd60:627a:a42b::/48` and refactors our existing code as calling the Tailscale service prefix explicitly (rather than implicitly). Removes the unused `Addresses` agent option. All clients today assume they can compute the Agent's IP address based on its UUID, so an agent started with a custom address would break things.	2024-10-04 10:04:10 +04:00
Spike Curtis	2df9a3e554	fix: fix tailnet remoteCoordination to wait for server (#14666 ) Fixes #12560 When gracefully disconnecting from the coordinator, we would send the Disconnect message and then close the dRPC stream. However, closing the dRPC stream can cause the server not to process the Disconnect message, since we use the stream context in a `select` while sending it to the coordinator. This is a product bug uncovered by the flake, and probably results in us failing graceful disconnect some minority of the time. Instead, the `remoteCoordination` (and `inMemoryCoordination` for consistency) should send the Disconnect message and then wait for the coordinator to hang up (on some graceful disconnect timer, in the form of a context).	2024-09-16 09:24:30 +04:00
Spike Curtis	fb3523b37f	chore: remove legacy AgentIP address (#14640 ) Removes the support for the Agent's "legacy IP" which was a hardcoded IP address all agents used to use, before we introduced "single tailnet". Single tailnet went GA in 2.7.0.	2024-09-12 07:40:19 +04:00
Ethan	8c15192433	feat(cli): add p2p diagnostics to ping (#14426 ) First PR to address #14244. Adds common potential reasons as to why a direct connection to the workspace agent couldn't be established to `coder ping`: - If the Coder deployment administrator has blocked direction connections (`CODER_BLOCK_DIRECT`). - If the client has no STUN servers within it's DERP map. - If the client or agent appears to be behind a hard NAT, as per Tailscale `netInfo.MappingVariesByDestIP` Also adds a warning if the client or agent has a network interface below the 'safe' MTU for tailnet. This warning is always displayed at the end of a `coder ping`.	2024-08-28 15:39:01 +10:00
Dean Sheather	cf8be4eac5	feat: add resume support to coordinator connections (#14234 )	2024-08-20 17:16:49 +10:00
Kyle Carberry	e2cec454bc	fix: check for io.EOF error in derpmap to resolve flake (#14125 ) See: https://github.com/coder/coder/actions/runs/10218717887/job/28275465405?pr=14045	2024-08-02 17:08:47 +00:00
Ethan	e8db21c89e	chore: add additional network telemetry stats & events (#13800 )	2024-07-10 14:14:35 +10:00
Ethan	a110d18275	chore: add DRPC tailnet & cli network telemetry (#13687 )	2024-07-03 15:23:46 +10:00
Dean Sheather	6c94dd4f23	chore: add DRPC server implementation for network telemetry (#13675 )	2024-07-02 01:50:52 +10:00
Spike Curtis	c94b5188bd	fix: modify workspacesdk to ask for tailnet API 2.0 (#13684 ) #13617 bumped the Agent/Tailnet API minor version because it adds telemetry features. However, we don't actually use the protocol features yet, so it's a bit obnoxious for our CLI client to ask for the newest API version. This is particularly true of the CLI client, since that's distributed separately, so if an end user installs the latest CLI client and their organization hasn't fully upgraded, then it will fail to connect. Since we have a release coming up and the telemetry stuff won't make it, I think we should roll back to version 2.0 until we actually implement the telemetry stuff. That way the newest release (2.13) will work with Coder servers all the way back to 2.9.	2024-06-27 15:38:21 +04:00
Spike Curtis	5b59f2880f	fix: fix workspacesdk to return error on API mismatch (#13683 )	2024-06-27 15:02:43 +04:00
Spike Curtis	1f9bdc36bf	fix: ignore yamux.ErrSessionShutdown on TestTailnetAPIConnector_Disconnects (#13532 )	2024-06-11 11:16:49 +04:00
Spike Curtis	3de737fdc8	fix: start packet capture immediately on speedtest (#13128 ) I initially made this change when hacking wgengine to also capture wireguard packets going into the magicsock, so that we could capture the initial wireguard handshake. I don't think we should ship that additional capture logic, but... it seems generally useful to capture packets from the get go on speedtest, so that you can see disco and pings before the TCP speedtest session starts.	2024-05-02 19:44:32 +04:00
Colin Adler	6b4eb03192	chore: give additional time in tests for `tailnetAPIConnector` graceful disconnect (#12980 ) Failure seen here: https://github.com/coder/coder/actions/runs/8711258577/job/23894964182?pr=12979	2024-04-17 12:38:17 -05:00
Colin Adler	e801e878ba	feat: add agent acks to in-memory coordinator (#12786 ) When an agent receives a node, it responds with an ACK which is relayed to the client. After the client receives the ACK, it's allowed to begin pinging.	2024-04-10 17:15:33 -05:00
Colin Adler	4d5a7b2d56	chore(codersdk): move all tailscale imports out of `codersdk` (#12735 ) Currently, importing `codersdk` just to interact with the API requires importing tailscale, which causes builds to fail unless manually using our fork.	2024-03-26 12:44:31 -05:00

40 Commits