Commit Graph

101 Commits

Author SHA1 Message Date
e8db21c89e chore: add additional network telemetry stats & events (#13800) 2024-07-10 14:14:35 +10:00
a110d18275 chore: add DRPC tailnet & cli network telemetry (#13687) 2024-07-03 15:23:46 +10:00
fed668b432 chore: switch ssh session stats based on experiment (#13637) 2024-06-25 10:58:45 -04:00
0268c7a659 chore: refactor autobuild/notify to use clock test (#13566)
Refactor autobuild/notify and tests to use the clock testing library.

I also rewrote some of the comments because I didn't understand them when I was looking at the package.
2024-06-13 16:01:17 +04:00
13dd526f11 fix: prevent stdlib logging from messing up ssh (#13161)
Fixes https://github.com/coder/coder/issues/13144
2024-05-03 22:12:06 +00:00
8a1216254e feat(cli): add --env flag for coder ssh (#12991)
This allows environment variables to be set on the SSH session.

Example:

   coder ssh myworkspace --env VAR1=val1,VAR2=val2
2024-04-22 13:13:48 +03:00
d426569d4a fix: make terminal raw in ssh command on windows (#12990) 2024-04-17 18:01:20 +00:00
d3790bb5be fix: use provided username when fetching workspaces (#12955) 2024-04-13 14:39:57 -04:00
93a233ac10 chore: write auto-update message after success (#12804) 2024-03-28 08:55:15 -05:00
4d5a7b2d56 chore(codersdk): move all tailscale imports out of codersdk (#12735)
Currently, importing `codersdk` just to interact with the API requires
importing tailscale, which causes builds to fail unless manually using
our fork.
2024-03-26 12:44:31 -05:00
b4c0fa80d8 chore(cli): rename Cmd to Command (#12616)
I think Command is cleaner and my original decision to use "Cmd"
a mistake.

Plus this creates better parity with cobra.
2024-03-17 09:45:26 -05:00
496232446d chore(cli): replace clibase with external coder/serpent (#12252) 2024-03-15 11:24:38 -05:00
895df54051 fix: separate signals for passive, active, and forced shutdown (#12358)
* fix: separate signals for passive, active, and forced shutdown

`SIGTERM`: Passive shutdown stopping provisioner daemons from accepting new
jobs but waiting for existing jobs to successfully complete.

`SIGINT` (old existing behavior): Notify provisioner daemons to cancel in-flight jobs, wait 5s for jobs to be exited, then force quit.

`SIGKILL`: Untouched from before, will force-quit.

* Revert dramatic signal changes

* Rename

* Fix shutdown behavior for provisioner daemons

* Add test for graceful shutdown
2024-03-15 13:16:36 +00:00
b96f6b48a4 fix: ensure ssh cleanup happens on cmd error
I noticed in my logs that sometimes `coder ssh` doesn't gracefully disconnect from the coordinator.

The cause is the `closerStack` construct we use in that function.  It has two paths to start closing things down:

1. explicit `close()` which we do in `defer`
2. context cancellation, which happens if the cli function returns an error

sometimes the ssh remote command returns an error, and this triggers context cancellation of the `closerStack`.  That is fine in and of itself, but we still want the explicit `close()` to wait until everything is closed before returning, since that's where we do cleanup, including the graceful disconnect.  Prior to this fix the `close()` just immediately exits if another goroutine is closing the stack.  Here we add a wait until everything is done.
2024-03-07 17:26:49 +04:00
4e7beee102 feat: show tailnet peer diagnostics after coder ping (#12314)
Beginnings of a solution to #12297 

Doesn't cover disco or definitively display whether we successfully connected to DERP, but shows some checklist diagnostics for connecting to an agent.

For this first PR, I just added it to `coder ping` to see how we like it, but could be incorporated into `coder ssh` _et al._ after a timeout.

```
$ coder ping dogfood2
p2p connection established in 147ms
pong from dogfood2 p2p via  95.217.xxx.yyy:42631  in 147ms
pong from dogfood2 p2p via  95.217.xxx.yyy:42631  in 140ms
pong from dogfood2 p2p via  95.217.xxx.yyy:42631  in 140ms
✔ preferred DERP region 999 (Council Bluffs, Iowa)
✔ sent local data to Coder networking coodinator
✔ received remote agent data from Coder networking coordinator
    preferred DERP 10013 (Europe Fly.io (Paris))
    endpoints: 95.217.xxx.yyy:42631, 95.217.xxx.yyy:37576, 172.17.0.1:37576, 172.20.0.10:37576
✔ Wireguard handshake 11s ago
```
2024-02-27 22:04:46 +04:00
e659957b65 fix(cli/ssh): prevent reads/writes to stdin/stdout in stdio mode (#12045)
Fixes #11530
2024-02-08 13:09:42 +02:00
77a4792ecd fix(cli): ssh: auto-update workspace (#11773) 2024-01-23 18:01:44 +01:00
200a87e7d4 feat(cli/ssh): allow multiple remote forwards and allow missing local file (#11648) 2024-01-19 15:21:10 +02:00
df3c310379 feat(cli): add coder open vscode (#11191)
Fixes #7667
2024-01-02 20:46:18 +02:00
37f6b38d53 fix: return 403 when rebuilding workspace with require_active_version (#11114) 2023-12-08 23:03:46 -06:00
cb89bc1729 feat: restart stopped workspaces on ssh command (#11050)
* feat: autostart workspaces on ssh & port forward

This is opt out by default. VScode ssh does not have this behavior
2023-12-08 10:01:13 -06:00
61be4dfe5a fix: improve exit codes for agent/agentssh and cli/ssh (#10850) 2023-11-24 14:35:56 +02:00
f20cc66c04 fix: give SSH stdio sessions a chance to close before closing netstack (#10815)
Man, graceful shutdown is hard.  Even after my changes, we were still hitting a graceful shutdown race: https://github.com/coder/coder/runs/18886842123

The problem was that while we attempt a graceful shutdown at the SSH layer by closing the session for writing, we were not giving it a chance to complete before continuing to tear down the stack of closers, including one that closes the netstack, and thus drop the TCP connection before it closes.
2023-11-22 13:11:21 +04:00
3dd35e019b fix: close ssh sessions gracefully (#10732)
Re-enables TestSSH/RemoteForward_Unix_Signal and addresses the underlying race: we were not closing the remote forward on context expiry, only the session and connection.

However, there is still a more fundamental issue in that we don't have the ability to ensure that TCP sessions are properly terminated before tearing down the Tailnet conn.  This is due to the assumption in the sockets API, that the underlying IP interface is long 
lived compared with the TCP socket, and thus closing a socket returns immediately and does not wait for the TCP termination handshake --- that is handled async in the tcpip stack.  However, this assumption does not hold for us and tailnet, since on shutdown,
we also tear down the tailnet connection, and this can race with the TCP termination.

Closing the remote forward explicitly should prevent forward state from accumulating, since the Close() function waits for a reply from the remote SSH server.

I've also attempted to workaround the TCP/tailnet issue for `--stdio` by using `CloseWrite()` instead of `Close()`.  By closing the write side of the connection, half-close the TCP connection, and the server detects this and closes the other direction, which then
triggers our read loop to exit only after the server has had a chance to process the close.

TODO in a stacked PR is to implement this logic for `vscodessh` as well.
2023-11-17 12:43:20 +04:00
4894eda711 feat: capture cli logs in tests (#10669)
Adds a Logger to cli Invocation and standardizes CLI commands to use it.  clitest creates a test logger by default so that CLI command logs are captured in the test logs.

CLI commands that do their own log configuration are modified to add sinks to the existing logger, rather than create a new one.  This ensures we still capture logs in CLI tests.
2023-11-14 22:56:27 +04:00
dc4b1ef406 fix: lock log sink against concurrent write and close (#10668)
fixes #10663
2023-11-14 16:38:34 +04:00
f400d8a0c5 fix: handle SIGHUP from OpenSSH (#10638)
Fixes an issue where remote forwards are not correctly torn down when using OpenSSH with `coder ssh --stdio`.  OpenSSH sends a disconnect signal, but then also sends SIGHUP to `coder`.  Previously, we just exited when we got SIGHUP, and this raced against properly disconnecting.

Fixes https://github.com/coder/customers/issues/327
2023-11-13 15:14:42 +04:00
1262eef2c0 feat: add support for coder_script (#9584)
* Add basic migrations

* Improve schema

* Refactor agent scripts into it's own package

* Support legacy start and stop script format

* Pipe the scripts!

* Finish the piping

* Fix context usage

* It works!

* Fix sql query

* Fix SQL query

* Rename `LogSourceID` -> `SourceID`

* Fix the FE

* fmt

* Rename migrations

* Fix log tests

* Fix lint err

* Fix gen

* Fix story type

* Rename source to script

* Fix schema jank

* Uncomment test

* Rename proto to TimeoutSeconds

* Fix comments

* Fix comments

* Fix legacy endpoint without specified log_source

* Fix non-blocking by default in agent

* Fix resources tests

* Fix dbfake

* Fix resources

* Fix linting I think

* Add fixtures

* fmt

* Fix startup script behavior

* Fix comments

* Fix context

* Fix cancel

* Fix SQL tests

* Fix e2e tests

* Interrupt on Windows

* Fix agent leaking script process

* Fix migrations

* Fix stories

* Fix duplicate logs appearing

* Gen

* Fix log location

* Fix tests

* Fix tests

* Fix log output

* Show display name in output

* Fix print

* Return timeout on start context

* Gen

* Fix fixture

* Fix the agent status

* Fix startup timeout msg

* Fix command using shared context

* Fix timeout draining

* Change signal type

* Add deterministic colors to startup script logs

---------

Co-authored-by: Muhammad Atif Ali <atif@coder.com>
2023-09-25 16:47:17 -05:00
6ba92ef924 ci: enable gocognit (#9359)
And, bring the server under 300:

* Removed the undocumented "disable" STUN address in favor of the
--disable-direct flag.
2023-08-27 14:46:44 -05:00
22e781eced chore: add /v2 to import module path (#9072)
* chore: add /v2 to import module path

go mod requires semantic versioning with versions greater than 1.x

This was a mechanical update by running:
```
go install github.com/marwan-at-work/mod/cmd/mod@latest
mod upgrade
```

Migrate generated files to import /v2

* Fix gen
2023-08-18 18:55:43 +00:00
bd944e0d21 chore: rename startup logs to agent logs (#8649)
* chore: rename startup logs to agent logs

This also adds a `source` property to every agent log. It
should allow us to group logs and display them nicer in
the UI as they stream in.

* Fix migration order

* Fix naming

* Rename the frontend

* Fix tests

* Fix down migration

* Match enums for workspace agent logs

* Fix inserting log source

* Fix migration order

* Fix logs tests

* Fix psql insert
2023-07-28 15:57:23 +00:00
9689bca5d2 feat(cli): implement ssh remote forward (#8515) 2023-07-20 12:05:39 +02:00
1c3bfacca3 fix(cli): ensure cliui.Agent doesn't fetch infinitely (#8446) 2023-07-12 10:21:54 -05:00
7fcf319e01 fix(cli)!: protect client Logger and refactor cli scaletest tests (#8317)
- (breaking) Protects Logger and LogBodies fields of codersdk.Client with its mutex. This addresses a data race in cli/scaletest.
- Fillets the existing cli/createworkspaces unit test and moves the testing logic there into the tests under scaletest/createworkspaces.
- Adds testutil.RaceEnabled bool const and conditionaly skips previously-skipped tests under scaletest/ if the race detector is enabled. This is unfortunate and sad, but I would prefer to have these tests at least running without the race detector than not running at all.
- Adds IgnoreErrors option to fake in-memory agent loggers; having the agents fail the test immediately when they encounter any sort of error isn't really helpful.
2023-07-06 09:43:39 +01:00
d3c39b60c9 feat: add agent log streaming and follow provisioner format (#8170) 2023-06-28 10:54:13 +02:00
24b95e16c4 feat: add --disable-direct flag to CLI (#8131) 2023-06-21 20:22:43 +00:00
b1d1b63113 chore: ensure logs consistency across Coder (#8083) 2023-06-20 12:30:45 +02:00
4bc4e63637 fix(cli/ssh): fix lint error (#7974) 2023-06-12 16:17:41 +00:00
5de1084639 feat(cli/ssh): simplify log file flags (#7863)
And, fix a race condition.
2023-06-12 09:18:33 +04:00
94aa9be33a feat(cli/ssh): implement wait options and deprecate no-wait (#7894)
Fixes #7768
Refs #7893
2023-06-08 16:52:44 +03:00
660bbb8d38 refactor: deprecate login_before_ready in favor of startup_script_behavior (#7837)
Fixes #7758
2023-06-06 11:58:07 +03:00
a7366a8b76 feat!: drop support for legacy parameters (#7663) 2023-06-02 11:16:46 +02:00
6a1e7ee1d0 feat: add file logger to coder ssh (#7646)
* coder ssh can log to file

Signed-off-by: Spike Curtis <spike@coder.com>

* Update golden file

Signed-off-by: Spike Curtis <spike@coder.com>

* generate CLI docs

Signed-off-by: Spike Curtis <spike@coder.com>

* Fix imports, typo

Signed-off-by: Spike Curtis <spike@coder.com>

* log more things!

Signed-off-by: Spike Curtis <spike@coder.com>

---------

Signed-off-by: Spike Curtis <spike@coder.com>
2023-05-25 05:07:39 +00:00
b6c8e5be48 fix(cli/ssh): Fetch up-to-date build info to avoid ws has no agents (#7650)
Fixes #5836
2023-05-24 12:37:22 +03:00
c2871e12aa fix(cli/ssh): Avoid connection hang when workspace is stopped (#7201)
* fix(cli/ssh): Avoid connection hang when workspace is stopped

Two issues are addressed here:
1. We were not detecting disconnects due to waiting for Stdin to close
   (disconnect would only propagate after entering input and failing to
   write to the connection).
2. In other scenarios, where the connection drop is not detected, we now
   also watch workspace status and drop the connection when a workspace
   reaches the stopped state.

Fixes: https://github.com/coder/jetbrains-coder/issues/199

Refs: #6180, #6175
2023-04-19 21:32:28 +03:00
0224426e5b refactor(agent): Move SSH server into agentssh package (#7004)
Refs: #6177
2023-04-06 19:39:22 +03:00
b439c3e167 fix: permit SSH by default when startup script fails (#6798) 2023-03-27 14:59:58 +00:00
2bd6d2908e feat: convert entire CLI to clibase (#6491)
I'm sorry.
2023-03-23 17:42:20 -05:00
2c2bbcc019 chore: update tests to support fish (#6023)
* fix: update tests to add fish support

* Track connections for SSH sessions to prevent leaks

* Revert SSH conn handling
2023-02-03 12:25:11 -06:00
8487127f5c chore: skip reconnecting pty scale tests (#5908)
* fix: close reconnecting pty conn when exiting agent

Fixes https://github.com/coder/coder/actions/runs/4038282899/jobs/6942170850

* Fix conpty

* Fix contrib

* Skip runner tests for being flakes

* Fix gpg key test

* Fix golden files

* Fix comments
2023-01-29 14:53:49 -06:00