Commit Graph

100 Commits

Author SHA1 Message Date
2cd3f999a6 feat: add one shot commands to the coder ssh command (#17779)
Closes #2154

> [!WARNING]  
> The tests in this PR were co-authored by AI
2025-05-16 10:09:46 -04:00
53ba3613b3 feat(cli): use coder connect in coder ssh --stdio, if available (#17572)
Closes https://github.com/coder/vscode-coder/issues/447
Closes https://github.com/coder/jetbrains-coder/issues/543
Closes https://github.com/coder/coder-jetbrains-toolbox/issues/21

This PR adds Coder Connect support to `coder ssh --stdio`. 

When connecting to a workspace, if `--force-new-tunnel` is not passed, the CLI will first do a DNS lookup for `<agent>.<workspace>.<owner>.<hostname-suffix>`. If an IP address is returned, and it's within the Coder service prefix, the CLI will not create a new tailnet connection to the workspace, and instead dial the SSH server running on port 22 on the workspace directly over TCP.

This allows IDE extensions to use the Coder Connect tunnel, without requiring any modifications to the extensions themselves. 

Additionally, `using_coder_connect` is added to the `sshNetworkStats` file, which the VS Code extension (and maybe Jetbrains?) will be able to read, and indicate to the user that they are using Coder Connect.

One advantage of this approach is that running `coder ssh --stdio` on an offline workspace with Coder Connect enabled will have the CLI wait for the workspace to build, the agent to connect (and optionally, for the startup scripts to finish), before finally connecting using the Coder Connect tunnel.

As a result, `coder ssh --stdio` has the overhead of looking up the workspace and agent, and checking if they are running. On my device, this meant `coder ssh --stdio <workspace>` was approximately a second slower than just connecting to the workspace directly using `ssh <workspace>.coder` (I would assume anyone serious about their Coder Connect usage would know to just do the latter anyway).
 
To ensure this doesn't come at a significant performance cost, I've also benchmarked this PR.

<details>
<summary>Benchmark</summary>

## Methodology
All tests were completed on `dev.coder.com`, where a Linux workspace running in AWS `us-west1` was created.
The machine running Coder Desktop (the 'client') was a Windows VM running in the same AWS region and VPC as the workspace.

To test the performance of specifically the SSH connection, a port was forwarded between the client and workspace using:
```
ssh -p 22 -L7001:localhost:7001 <host>
```
where `host` was either an alias for an SSH ProxyCommand that called `coder ssh`, or a Coder Connect hostname.

For latency, [`tcping`](https://www.elifulkerson.com/projects/tcping.php) was used against the forwarded port:
```
tcping -n 100 localhost 7001
```

For throughput, [`iperf3`](https://iperf.fr/iperf-download.php) was used:
```
iperf3 -c localhost -p 7001
```
where an `iperf3` server was running on the workspace on port 7001.

## Test Cases

### Testcase 1: `coder ssh` `ProxyCommand` that bicopies from Coder Connect
This case tests the implementation in this PR, such that we can write a config like:
```
Host codercliconnect
    ProxyCommand /path/to/coder ssh --stdio workspace
```
With Coder Connect enabled, `ssh -p 22 -L7001:localhost:7001 codercliconnect` will use the Coder Connect tunnel. The results were as follows:

**Throughput, 10 tests, back to back:**
- Average throughput across all tests: 788.20 Mbits/sec
- Minimum average throughput: 731 Mbits/sec
- Maximum average throughput: 871 Mbits/sec
- Standard Deviation: 38.88 Mbits/sec

**Latency, 100 RTTs:**
- Average: 0.369ms
- Minimum: 0.290ms
- Maximum: 0.473ms

### Testcase 2: `ssh` dialing Coder Connect directly without a `ProxyCommand`

This is what we assume to be the 'best' way to use Coder Connect

**Throughput, 10 tests, back to back:**
- Average throughput across all tests: 789.50 Mbits/sec
- Minimum average throughput: 708 Mbits/sec
- Maximum average throughput: 839 Mbits/sec
- Standard Deviation: 39.98 Mbits/sec

**Latency, 100 RTTs:**
- Average: 0.369ms
- Minimum: 0.267ms
- Maximum: 0.440ms

### Testcase 3:  `coder ssh` `ProxyCommand` that creates its own Tailnet connection in-process

This is what normally happens when you run `coder ssh`:

**Throughput, 10 tests, back to back:**
- Average throughput across all tests: 610.20 Mbits/sec
- Minimum average throughput: 569 Mbits/sec
- Maximum average throughput: 664 Mbits/sec
- Standard Deviation: 27.29 Mbits/sec

**Latency, 100 RTTs:**
- Average: 0.335ms
- Minimum: 0.262ms
- Maximum: 0.452ms

## Analysis

Performing a two-tailed, unpaired t-test against the throughput of testcases 1 and 2, we find a P value of `0.9450`. This suggests the difference between the data sets is not statistically significant. In other words, there is a 94.5% chance that the difference between the data sets is due to chance.

## Conclusion

From the t-test, and by comparison to the status quo (regular `coder ssh`, which uses gvisor, and is noticeably slower), I think it's safe to say any impact on throughput or latency by the `ProxyCommand` performing a bicopy against Coder Connect is negligible. Users are very much unlikely to run into performance issues as a result of using Coder Connect via `coder ssh`, as implemented in this PR.

Less scientifically, I ran these same tests on my home network with my Sydney workspace, and both throughput and latency were consistent across testcases 1 and 2.

</details>
2025-04-30 15:17:10 +10:00
1fc74f629e refactor(agent): update agentcontainers api initialization (#17600)
There were too many ways to configure the agentcontainers API resulting
in inconsistent behavior or features not being enabled. This refactor
introduces a control flag for enabling or disabling the containers API.
When disabled, all implementations are no-op and explicit endpoint
behaviors are defined. When enabled, concrete implementations are used
by default but can be overridden by passing options.
2025-04-29 17:53:10 +03:00
f670bc31f5 chore: update testutil chan helpers (#17408) 2025-04-16 10:37:09 -06:00
e7e47537c9 chore: fix gpg forwarding test (#17355) 2025-04-11 03:33:53 +00:00
d312e82a51 feat: support --hostname-suffix flag on coder ssh (#17279)
Adds `hostname-suffix` flag to `coder ssh` command for use in SSH Config ProxyCommands.

Also enforces that Coder server doesn't start the suffix with a dot.

part of: #16828
2025-04-07 21:33:33 +04:00
fc471eb384 fix: handle vscodessh style workspace names in coder ssh (#17154)
Fixes an issue where old ssh configs that use the
`owner--workspace--agent` format will fail to properly use the `coder
ssh` command since we migrated off the `coder vscodessh` command.
2025-04-07 10:06:58 -04:00
f6bf6c6ec4 fix!: use names not IDs for agent SSH key seed (#17258)
Changes the SSH host key seeding to use the owner username, workspace name, and agent name. This prevents SSH from complaining about a mismatched host key if you use Coder Desktop to connect, and delete and recreate your workspace with the same name. Previously this would generate a different key because the workspace ID changed.

We also include the owner's username in anticipation of using Coder Desktop to access shared workspaces (or as a superuser) down the road, so that workspaces with the same name owned by different users will not have the same key.

This change is **BREAKING** in a limited sense that early access users of Coder Desktop will see their SSH clients complain about host keys changing the first time each workspace is rebuilt with this code. It can be resolved by clearing your `.ssh/known_hosts` file of the Coder workspaces you access this way.
2025-04-04 12:51:46 +04:00
a9574fb4b1 chore(cli): increase timeout for TestSSH_Container subtests (#17148)
Closes https://github.com/coder/internal/issues/524
2025-03-28 13:52:13 +00:00
17ddee05e5 chore: update golang to 1.24.1 (#17035)
- Update go.mod to use Go 1.24.1
- Update GitHub Actions setup-go action to use Go 1.24.1
- Fix linting issues with golangci-lint by:
  - Updating to golangci-lint v1.57.1 (more compatible with Go 1.24.1)

🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <claude@anthropic.com>
2025-03-26 01:56:39 -05:00
3ac844ad3d chore(codersdk): rename WorkspaceAgent(Dev)container structs (#16996)
This is to free up the devcontainer name space for more targeted
structs.

Updates #16423
2025-03-19 10:16:14 +00:00
ca23abcc30 chore(cli): fix test flake in TestSSH_Container/NotFound (#16771)
If you hit the list containers endpoint with no containers running, the
response is different. This uses a mock lister to ensure a consistent
response from the agent endpoint.
2025-03-03 14:15:25 +00:00
ec44f06f5c feat(cli): allow SSH command to connect to running container (#16726)
Fixes https://github.com/coder/coder/issues/16709 and
https://github.com/coder/coder/issues/16420

Adds the capability to`coder ssh` into a running container if `CODER_AGENT_DEVCONTAINERS_ENABLE=true`.

Notes:
* SFTP is currently not supported
* Haven't tested X11 container forwarding
* Haven't tested agent forwarding
2025-02-28 09:38:45 +00:00
660746462e fix(agent/agentssh): use deterministic host key for SSH server (#16626)
Fixes: https://github.com/coder/coder/issues/16490

The Agent's SSH server now initially generates fixed host keys and, once it receives its manifest, generates and replaces that host key with the one derived from the workspace ID, ensuring consistency across agent restarts. This prevents SSH warnings and host key verification errors when connecting to workspaces through Coder Desktop.

While deterministic keys might seem insecure, the underlying Wireguard tunnel already provides encryption and anti-spoofing protection at the network layer, making this approach acceptable for our use case.

---
Change-Id: I8c7e3070324e5d558374fd6891eea9d48660e1e9
Signed-off-by: Thomas Kosiewski <tk@coder.com>
2025-02-21 14:58:41 +01:00
7cf62423ec test(cli): fix TestSSH/RemoteForward_Unix_Signal flake (#16172) 2025-01-17 16:53:09 +02:00
2413106f22 fix: improve shell compatibility of netstat check in test (#16141)
When I wrote the original just the other day, I used `$?`, which is fine
on CI and in most cases, but not when the person running the test has
their system shell set to fish (Fish uses $status) instead. In the
interest of letting this test pass locally, I'll instead just grab the
line count of the grep output. However, `wc` is padded on macos with
spaces, so we need to get rid of those too.
2025-01-15 03:23:53 +00:00
1aa9e32a2b feat: add --ssh-host-prefix flag for "coder ssh" (#16088)
This adds a flag matching `--ssh-host-prefix` from `coder config-ssh` to
`coder ssh`. By trimming a custom prefix from the argument, we can set
up wildcard-based `Host` entries in SSH config for the IDE plugins (and
eventually `coder config-ssh`).

We also replace `--` in the argument with `/`, so ownership can be
specified in wildcard-based SSH hosts like `<owner>--<workspace>`.

Replaces #16087.

Part of https://github.com/coder/coder/issues/14986.

Related to https://github.com/coder/coder/pull/16078 and
https://github.com/coder/coder/pull/16080.
2025-01-13 19:07:21 -06:00
838ee3b244 feat: add --network-info-dir and --network-info-interval flags to coder ssh (#16078)
This is the first in a series of PRs to enable `coder ssh` to replace
`coder vscodessh`.

This change adds `--network-info-dir` and `--network-info-interval`
flags to the `ssh` subcommand. These were formerly only available with
the `vscodessh` subcommand.

Subsequent PRs will add a `--ssh-host-prefix` flag to the ssh
subcommand, and adjust the log file naming to contain the parent PID.
2025-01-13 18:29:31 -06:00
a7fe35af25 fix: use netstat over ss when testing unix socket (#16103)
Closes https://github.com/coder/internal/issues/274.

`TestSSH/RemoteForwardUnixSocket` previously used `ss` for confirming if
a socket was listening. `ss` isn't available on macOS, causing the test
to flake.

The test previously passed on macOS as a 2 could always be read on the
SSH connection, presumably reading it as part of some escape sequence? I
confirmed the test passed on Linux if you comment out the `ss` command,
the pty would always read a sequence ending in `[?2`.
2025-01-13 20:51:55 +11:00
8c44cd3dfd test(cli/ssh): fix ssh start conflict test by faking API response (#16082) 2025-01-10 14:48:11 +00:00
ba6e84dec3 fix(cli/ssh): retry on autostart conflict (#16058) 2025-01-08 15:15:30 +02:00
5861e516b9 chore: add standard test logger ignoring db canceled (#15556)
Refactors our use of `slogtest` to instantiate a "standard logger" across most of our tests.  This standard logger incorporates https://github.com/coder/slog/pull/217 to also ignore database query canceled errors by default, which are a source of low-severity flakes.

Any test that has set non-default `slogtest.Options` is left alone. In particular, `coderdtest` defaults to ignoring all errors. We might consider revisiting that decision now that we have better tools to target the really common flaky Error logs on shutdown.
2024-11-18 14:09:22 +04:00
343f8ec9ab chore: join owner, template, and org in new workspace view (#15116)
Joins in fields like `username`, `avatar_url`, `organization_name`,
`template_name` to `workspaces` via a **view**. 
The view must be maintained moving forward, but this prevents needing to
add RBAC permissions to fetch related workspace fields.
2024-10-22 09:20:54 -05:00
bf4b7abf14 chore(coderd): allow creating workspaces without specifying an organization (#14048) 2024-07-30 10:44:02 -06:00
fed668b432 chore: switch ssh session stats based on experiment (#13637) 2024-06-25 10:58:45 -04:00
8a1216254e feat(cli): add --env flag for coder ssh (#12991)
This allows environment variables to be set on the SSH session.

Example:

   coder ssh myworkspace --env VAR1=val1,VAR2=val2
2024-04-22 13:13:48 +03:00
f418ece9ae test(cli): prevent flake due to outdated build in TestSSH (#12760)
Fixes #12752
2024-03-26 10:46:58 +00:00
af3fdc68c3 chore: refactor agent routines that use the v2 API (#12223)
In anticipation of needing the `LogSender` to run on a context that doesn't get immediately canceled when you `Close()` the agent, I've undertaken a little refactor to manage the goroutines that get run against the Tailnet and Agent API connection.

This handles controlling two contexts, one that gets canceled right away at the start of graceful shutdown, and another that stays up to allow graceful shutdown to complete.
2024-02-23 11:04:23 +04:00
da376549a3 fix: stop waiting for Agent in a goroutine in ssh test (#12268)
Fixes race seen here: https://github.com/coder/coder/runs/21852483781

What happens is that the agent connects, completes the test, and then disconnects before the Eventually condition runs.  The waiter then times out because it's looking for a connected agent.

Then, since it's a `require` in a goroutine, that causes the `tGo` cleanup to hang and the whole test suite to timeout after 10 minutes.

Anyway, `agenttest.New` doesn't block, and we don't actually need to wait for the agent to connect, since a successful SSH session is evidence that it connected.
2024-02-22 17:01:06 +04:00
e659957b65 fix(cli/ssh): prevent reads/writes to stdin/stdout in stdio mode (#12045)
Fixes #11530
2024-02-08 13:09:42 +02:00
77a4792ecd fix(cli): ssh: auto-update workspace (#11773) 2024-01-23 18:01:44 +01:00
200a87e7d4 feat(cli/ssh): allow multiple remote forwards and allow missing local file (#11648) 2024-01-19 15:21:10 +02:00
385d58caf6 fix(agent/agentssh): allow remote forwarding a socket multiple times (#11631)
* fix(agent/agentssh): allow remote forwarding a socket multiple times

Fixes #11198
Fixes https://github.com/coder/customers/issues/407
2024-01-16 21:26:13 +02:00
cb89bc1729 feat: restart stopped workspaces on ssh command (#11050)
* feat: autostart workspaces on ssh & port forward

This is opt out by default. VScode ssh does not have this behavior
2023-12-08 10:01:13 -06:00
46d95cb0f0 fix: wait for dial goroutine to complete (#10959)
Fixes flake seen here: https://github.com/coder/coder/runs/19170327767

The goroutine that attempts to dial the socket didn't complete before the test did.  Here we add an explicit wait for it to complete in each run of the loop.
2023-12-01 11:37:32 +04:00
967db2801b chore: refactor ResolveAutostart tests to use dbfake (#10603) 2023-11-30 19:33:04 -06:00
2dc565d5de chore: remove New----Builder from dbfake function names (#10882)
Drop "New" and "Builder" from the function names, in favor of the top-level resource created.  This shortens tests and gives a nice syntax.  Since everything is a builder, the prefix and suffix don't add much value and just make things harder to read.

I've also chosen to leave `Do()` as the function to insert into the database.  Even though it's a builder pattern, I fear `.Build()` might be confusing with Workspace Builds.  One other idea is `Insert()` but if we later add dbfake functions that update, this might be inconsistent.
2023-11-29 11:06:04 +04:00
78283a7fb9 chore: remove dbfake.WorkspaceWithAgent (#10879)
Replace dbfake.WorkspaceWithAgent() with the builder pattern and remove this function.
2023-11-27 14:30:15 +04:00
b25e5dc90b chore: remove dbfake.WorkspaceBuild in favor of builder pattern (#10814)
I'd like to convert dbfake into a builder pattern to prevent a proliferation of XXXWithYYY methods.  This is one step of the way by removing the Non-builder function.
2023-11-22 13:04:58 +04:00
5d5b5aa074 chore: use dbfake for ssh tests rather than provisionerd (#10812)
Refactors SSH tests to skip provisionerd and instead use dbfake to insert workspaces and builds.  This should make tests faster and more reliable.

dbfake.WorkspaceBuild is refactored to use a "builder" pattern with "fluent" options, as the number of options and variants was starting to get out of hand.
2023-11-21 16:22:08 +04:00
92ef0baff3 fix: remove pty match for TestSSH/RemoteForward (#10789)
Fixes #10578
2023-11-20 20:50:09 +04:00
3dd35e019b fix: close ssh sessions gracefully (#10732)
Re-enables TestSSH/RemoteForward_Unix_Signal and addresses the underlying race: we were not closing the remote forward on context expiry, only the session and connection.

However, there is still a more fundamental issue in that we don't have the ability to ensure that TCP sessions are properly terminated before tearing down the Tailnet conn.  This is due to the assumption in the sockets API, that the underlying IP interface is long 
lived compared with the TCP socket, and thus closing a socket returns immediately and does not wait for the TCP termination handshake --- that is handled async in the tcpip stack.  However, this assumption does not hold for us and tailnet, since on shutdown,
we also tear down the tailnet connection, and this can race with the TCP termination.

Closing the remote forward explicitly should prevent forward state from accumulating, since the Close() function waits for a reply from the remote SSH server.

I've also attempted to workaround the TCP/tailnet issue for `--stdio` by using `CloseWrite()` instead of `Close()`.  By closing the write side of the connection, half-close the TCP connection, and the server detects this and closes the other direction, which then
triggers our read loop to exit only after the server has had a chance to process the close.

TODO in a stacked PR is to implement this logic for `vscodessh` as well.
2023-11-17 12:43:20 +04:00
34c9661f1b fix: disable flaky test TestSSH/RemoteForward_Unix_Signal (#10711) 2023-11-15 11:04:36 +00:00
4894eda711 feat: capture cli logs in tests (#10669)
Adds a Logger to cli Invocation and standardizes CLI commands to use it.  clitest creates a test logger by default so that CLI command logs are captured in the test logs.

CLI commands that do their own log configuration are modified to add sinks to the existing logger, rather than create a new one.  This ensures we still capture logs in CLI tests.
2023-11-14 22:56:27 +04:00
f400d8a0c5 fix: handle SIGHUP from OpenSSH (#10638)
Fixes an issue where remote forwards are not correctly torn down when using OpenSSH with `coder ssh --stdio`.  OpenSSH sends a disconnect signal, but then also sends SIGHUP to `coder`.  Previously, we just exited when we got SIGHUP, and this raced against properly disconnecting.

Fixes https://github.com/coder/customers/issues/327
2023-11-13 15:14:42 +04:00
a1ee4d44aa fix: test: TestSSH_RemoteForward wait for startup script (#10211) 2023-10-11 14:17:04 +02:00
24c80bf532 fix: remove AwaitWorkspaceAgents in goroutines
AwaitWorkspaceAgent calls testify.require which isn't allowed from a goroutine and causes cascading failures in the test suite such as: https://github.com/coder/coder/actions/runs/6458768855/job/17533163316

I don't believe these functions serve a direct purpose since nothing else is "waiting" for the functions to return before doing other things.
2023-10-09 20:37:23 +04:00
c194119689 chore: rename AwaitTemplateVersionJobCompleted and AwaitWorkspaceBuildJobCompleted (#10003) 2023-10-03 11:02:56 -06:00
4966ef02cf feat(cli): add reverse tunnelling SSH support for unix sockets (#9976) 2023-10-03 16:39:39 +10:00
fad02081fc fix: avoid logging env in unit tests (#9885) 2023-09-27 13:34:40 +01:00