Commit Graph

61 Commits

Author SHA1 Message Date
2c7f8ac65f chore: migrate to coder/websocket 1.8.12 (#15898)
Migrates us to `coder/websocket` v1.8.12 rather than `nhooyr/websocket` on an older version.

Works around https://github.com/coder/websocket/issues/504 by adding an explicit test for `xerrors.Is(err, io.EOF)` where we were previously getting `io.EOF` from the netConn.
2024-12-19 00:51:30 +04:00
40802958e9 fix: use explicit api versions for agent and tailnet (#15508)
Bumps the Tailnet and Agent API version 2.3, and creates some extra controls and machinery around these versions.

What happened is that we accidentally shipped two new API features without bumping the version.  `ScriptCompleted` on the Agent API in Coder v2.16 and `RefreshResumeToken` on the Tailnet API in Coder v2.15.

Since we can't easily retroactively bump the versions, we'll roll these changes into API version 2.3 along with the new WorkspaceUpdates RPC, which hasn't been released yet.  That means there is some ambiguity in Coder v2.15-v2.17 about exactly what methods are supported on the Tailnet and Agent APIs.  This isn't great, but hasn't caused us major issues because 

1. RefreshResumeToken is considered optional, and clients just log and move on if the RPC isn't supported. 
2. Agents basically never get started talking to a Coderd that is older than they are, since the agent binary is normally downloaded from Coderd at workspace start.

Still it's good to get things squared away in terms of versions for SDK users and possible edge cases around client and server versions.

To mitigate against this thing happening again, this PR also:

1. adds a CODEOWNERS for the API proto packages, so I'll review changes
2. defines interface types for different API versions, and has the agent explicitly use a specific version.  That way, if you add a new method, and try to use it in the agent without thinking explicitly about versions, it won't compile.

With the protocol controllers stuff, we've sort of already abstracted the Tailnet API such that the interface type strategy won't work, but I'll work on getting the Controller to be version aware, such that it can check the API version it's getting against the controllers it has -- in a later PR.
2024-11-15 11:16:28 +04:00
f6639b788f feat: add ConnectRPC variants for older Agent API versions (#13778) 2024-07-03 16:11:05 +04:00
dd243686e4 chore!: remove deprecated agent v1 routes (#13486) 2024-06-11 12:22:59 +10:00
9d00a26a90 fix: add missing route for codersdk.PostLogSource (#13421) 2024-06-03 12:29:50 -05:00
e76b595052 fix: use a native websocket.NetConn for agent RPC client (#13142)
One cause of #13139 is a peculiar failure mode of `WebsocketNetConn` which causes it to return `context.Canceled` in some circumstances when the underlying websocket fails.  We have special processing for that error in the `agent.run()` routine, which is erroneously being triggered.

Since we don't actually need the returned context from `WebsocketNetConn`, we can simplify and just use the netConn from the `websocket` library directly.
2024-05-06 15:00:34 +04:00
b0afffbafb feat: use v2 API for agent metadata updates (#12281)
Switches the agent to report metadata over the v2 API.

Fixes #10534
2024-02-26 09:50:19 +04:00
aa7a9f5cc4 feat: use v2 API for agent lifecycle updates (#12278)
Agent uses the v2 API to post lifecycle updates.

Part of #10534
2024-02-23 15:24:28 +04:00
4cc132cea0 feat: switch agent to use v2 API for sending logs (#12068)
Changes the agent to use the new v2 API for sending logs, via the logSender component.

We keep the PatchLogs function around, but deprecate it so that we can test the v1 endpoint.
2024-02-23 11:27:15 +04:00
2fc3064653 chore: add tests for app ID copy in app healths (#12088) 2024-02-12 05:49:48 +00:00
1f5a6d59ba chore: consolidate websocketNetConn implementations (#12065)
Consolidates websocketNetConn from multiple packages in favor of a central one in codersdk
2024-02-09 11:39:08 +04:00
151aaadc23 fix: allow startup scripts larger than 32k (#12060)
Fixes #12057 and adds a regression test.
2024-02-07 22:26:42 +04:00
1cf4b62867 feat: change agent to use v2 API for reporting stats (#12024)
Modifies the agent to use the v2 API to report its statistics, using the `statsReporter` subcomponent.
2024-02-07 15:26:41 +04:00
1aa117b9ec chore: rename client Listen to ConnectRPC (#11916)
ConnectRPC seems more appropriate for this function
2024-02-01 14:44:11 +04:00
073d1f7078 chore: remove pingWebSocket since yamux runs keepalives (#11914)
Since we run yamux over the websocket, we don't need to ping at the websocket layer because yamux has a 30 second keepalive mechanism enabled in the default config.
2024-02-01 09:48:58 +04:00
13e214f7f1 feat: add logging to agent yamux session (#11912)
Log yamux errors and warnings in the agent.
2024-02-01 08:18:13 +04:00
0fc177203e feat: use agent v2 API to update app health (#11889)
Use the Agent v2 API to update App Health
2024-01-30 11:35:12 +04:00
2599850e54 feat: use agent v2 API to post startup (#11877)
Uses the v2 Agent API to post startup information.
2024-01-30 11:23:28 +04:00
da8bb1c198 feat: use agent v2 API to fetch manifest (#11832)
Agent uses the v2 API to obtain the manifest, instead of the HTTP API.
2024-01-30 10:11:28 +04:00
13e24f21e4 feat: use Agent v2 API for Service Banner (#11806)
Agent uses the v2 API for the service banner, rather than the v1 HTTP API.

One of several for #10534
2024-01-30 07:44:47 +04:00
059e533544 feat: agent uses Tailnet v2 API for DERPMap updates (#11698)
Switches the Agent to use Tailnet v2 API to get DERPMap updates.

Subsequent PRs will do the same for the CLI (`codersdk`) and `wsproxy`.
2024-01-23 14:42:07 +04:00
3e0e7f8739 feat: check agent API version on connection (#11696)
fixes #10531

Adds a check for `version` on connection to the Agent API websocket endpoint.  This is primarily for future-proofing, so that up-level agents get a sensible error if they connect to a back-level Coderd.

It also refactors the location of the `CurrentVersion` variables, to be part of the `proto` packages, since the versions refer to the APIs defined therein.
2024-01-23 14:27:49 +04:00
f01cab9894 feat: use tailnet v2 API for coordination (#11638)
This one is huge, and I'm sorry.

The problem is that once I change `tailnet.Conn` to start doing v2 behavior, I kind of have to change it everywhere, including in CoderSDK (CLI), the agent, wsproxy, and ServerTailnet.

There is still a bit more cleanup to do, and I need to add code so that when we lose connection to the Coordinator, we mark all peers as LOST, but that will be in a separate PR since this is big enough!
2024-01-22 11:07:50 +04:00
df3c310379 feat(cli): add coder open vscode (#11191)
Fixes #7667
2024-01-02 20:46:18 +02:00
baf3bf6b9c feat: add workspace_id, owner_name to agent manifest (#10199)
Co-authored-by: Kyle Carberry <kyle@carberry.com>
Co-authored-by: Atif Ali <atif@coder.com>
2023-12-04 00:41:54 +03:00
51b58cfc98 fix: only update last_used_at when connection count > 0 (#10808) 2023-11-21 18:10:41 -06:00
4857d4bd55 feat(codersdk/agentsdk): use new agent metadata batch endpoint (#10224)
Part of #9782
2023-10-13 17:32:28 +03:00
7eeba15d16 feat(coderd): add support for sending batched agent metadata (#10223)
Part of #9782
2023-10-13 16:37:55 +03:00
9c098b218f feat: allow external auth providers to expose extra metadata (#10157) 2023-10-09 23:02:16 -05:00
35538e1051 feat: add external-auth cli (#10052)
* feat: add `external-auth` cli

* Add subcommands

* Improve descriptions

* Add external-auth subcommand

* Fix docs

* Fix gen

* Fix comment

* Fix golden file
2023-10-09 23:04:35 +00:00
1262eef2c0 feat: add support for coder_script (#9584)
* Add basic migrations

* Improve schema

* Refactor agent scripts into it's own package

* Support legacy start and stop script format

* Pipe the scripts!

* Finish the piping

* Fix context usage

* It works!

* Fix sql query

* Fix SQL query

* Rename `LogSourceID` -> `SourceID`

* Fix the FE

* fmt

* Rename migrations

* Fix log tests

* Fix lint err

* Fix gen

* Fix story type

* Rename source to script

* Fix schema jank

* Uncomment test

* Rename proto to TimeoutSeconds

* Fix comments

* Fix comments

* Fix legacy endpoint without specified log_source

* Fix non-blocking by default in agent

* Fix resources tests

* Fix dbfake

* Fix resources

* Fix linting I think

* Add fixtures

* fmt

* Fix startup script behavior

* Fix comments

* Fix context

* Fix cancel

* Fix SQL tests

* Fix e2e tests

* Interrupt on Windows

* Fix agent leaking script process

* Fix migrations

* Fix stories

* Fix duplicate logs appearing

* Gen

* Fix log location

* Fix tests

* Fix tests

* Fix log output

* Show display name in output

* Fix print

* Return timeout on start context

* Gen

* Fix fixture

* Fix the agent status

* Fix startup timeout msg

* Fix command using shared context

* Fix timeout draining

* Change signal type

* Add deterministic colors to startup script logs

---------

Co-authored-by: Muhammad Atif Ali <atif@coder.com>
2023-09-25 16:47:17 -05:00
64df076328 feat: add server flag to force DERP to use always websockets (#9238) 2023-08-24 17:22:31 +00:00
22e781eced chore: add /v2 to import module path (#9072)
* chore: add /v2 to import module path

go mod requires semantic versioning with versions greater than 1.x

This was a mechanical update by running:
```
go install github.com/marwan-at-work/mod/cmd/mod@latest
mod upgrade
```

Migrate generated files to import /v2

* Fix gen
2023-08-18 18:55:43 +00:00
07fd73c4a0 chore: allow multiple agent subsystems, add exectrace (#8933) 2023-08-08 22:10:28 -07:00
b955c5fefc fix: avoid agent runLoop exiting due to ws ping (#8852) 2023-08-02 07:25:07 +00:00
6b69970d7c fix: avoid infinite loop in agent derp-map (#8848) 2023-08-02 13:18:46 +10:00
bd944e0d21 chore: rename startup logs to agent logs (#8649)
* chore: rename startup logs to agent logs

This also adds a `source` property to every agent log. It
should allow us to group logs and display them nicer in
the UI as they stream in.

* Fix migration order

* Fix naming

* Rename the frontend

* Fix tests

* Fix down migration

* Match enums for workspace agent logs

* Fix inserting log source

* Fix migration order

* Fix logs tests

* Fix psql insert
2023-07-28 15:57:23 +00:00
2f0a9996e7 chore: add derpserver to wsproxy, add proxies to derpmap (#7311) 2023-07-27 02:21:04 +10:00
517fb19474 feat: add single tailnet support to moons (#8587) 2023-07-19 11:11:11 -05:00
98164f687e fix!: remove startup logs eof for streaming (#8528)
* fix: remove startup logs eof for streaming

We have external utilities like logstream-kube that may send
logs after an agent shuts down unexpectedly to report additional
information. In a recent change we stopped accepting these logs,
which broke these utilities.

In the future we'll rename startup logs to agent logs or something
more generalized so this is less confusing in the future.

* fix(cli/cliui): handle never ending startup log stream in Agent

---------

Co-authored-by: Mathias Fredriksson <mafredri@gmail.com>
2023-07-18 09:57:29 -06:00
c47b78c44b chore: replace wsconncache with a single tailnet (#8176) 2023-07-12 17:37:31 -05:00
7fcf319e01 fix(cli)!: protect client Logger and refactor cli scaletest tests (#8317)
- (breaking) Protects Logger and LogBodies fields of codersdk.Client with its mutex. This addresses a data race in cli/scaletest.
- Fillets the existing cli/createworkspaces unit test and moves the testing logic there into the tests under scaletest/createworkspaces.
- Adds testutil.RaceEnabled bool const and conditionaly skips previously-skipped tests under scaletest/ if the race detector is enabled. This is unfortunate and sad, but I would prefer to have these tests at least running without the race detector than not running at all.
- Adds IgnoreErrors option to fake in-memory agent loggers; having the agents fail the test immediately when they encounter any sort of error isn't really helpful.
2023-07-06 09:43:39 +01:00
6015319e9d feat: show service banner in SSH/TTY sessions (#8186)
* Allow workspace agents to get appearance
* Poll for service banner every two minutes
* Show service banner before MOTD if not quiet
2023-06-30 10:41:29 -08:00
a28d422c35 feat: add flag to disable all direct connections (#7936) 2023-06-21 22:02:05 +00:00
8dac0356ed refactor: replace startup script logs EOF with starting/ready time (#8082)
This commit reverts some of the changes in #8029 and implements an
alternative method of keeping track of when the startup script has ended
and there will be no more logs.

This is achieved by adding new agent fields for tracking when the agent
enters the "starting" and "ready"/"start_error" lifecycle states. The
timestamps simplify logic since we don't need understand if the current
state is before or after the state we're interested in. They can also be
used to show data like how long the startup script took to execute. This
also allowed us to remove the EOF field from the logs as the
implementation was problematic when we returned the EOF log entry in the
response since requesting _after_ that ID would give no logs and the API
would thus lose track of EOF.
2023-06-20 14:41:55 +03:00
0c5077464b fix: avoid missed logs when streaming startup logs (#8029)
* feat(coderd,agent): send startup log eof at the end

* fix(coderd): fix edge case in startup log pubsub

* fix(coderd): ensure startup logs are closed on lifecycle state change (fallback)

* fix(codersdk): fix startup log channel shared memory bug

* fix(site): remove the EOF log line
2023-06-16 17:14:22 +03:00
14efdadd3c feat: Collect agent SSH metrics (#7584) 2023-05-25 12:52:36 +02:00
00a2413c03 feat: add telemetry support for workspace agent subsystem (#7579) 2023-05-17 22:49:25 -05:00
bb0a38b161 feat: Implement aggregator for agent metrics (#7259) 2023-04-27 12:34:00 +02:00
81e2b2500a feat: add level support for startup logs (#7067)
This allows external services like our devcontainer support to display
errors and warnings with custom styles to indicate failures to users.
2023-04-10 14:29:59 -05:00