coder

mirror of https://github.com/coder/coder.git synced 2025-07-09 11:45:56 +00:00

Author	SHA1	Message	Date
Spike Curtis	8c00ebc6ee	chore: refactor ServerTailnet to use tailnet.Controllers (#15408 ) chore of #14729 Refactors the `ServerTailnet` to use `tailnet.Controller` so that we reuse logic around reconnection and handling control messages, instead of reimplementing. This unifies our "client" use of the tailscale API across CLI, coderd, and wsproxy.	2024-11-08 13:18:56 +04:00
Spike Curtis	886dcbec84	chore: refactor coordination (#15343 ) Refactors the way clients of the Tailnet API (clients of the API, which include both workspace "agents" and "clients") interact with the API. Introduces the idea of abstract "controllers" for each of the RPCs in the API, and implements a Coordination controller by refactoring from `workspacesdk`. chore re: #14729	2024-11-05 13:50:10 +04:00
Ethan	b1298a3c1e	feat: add WorkspaceUpdates tailnet RPC (#14847 ) Closes #14716 Closes #14717 Adds a new user-scoped tailnet API endpoint (`api/v2/tailnet`) with a new RPC stream for receiving updates on workspaces owned by a specific user, as defined in #14716. When a stream is started, the `WorkspaceUpdatesProvider` will begin listening on the user-scoped pubsub events implemented in #14964. When a relevant event type is seen (such as a workspace state transition), the provider will query the DB for all the workspaces (and agents) owned by the user. This gets compared against the result of the previous query to produce a set of workspace updates. Workspace updates can be requested for any user ID, however only workspaces the authorised user is permitted to `ActionRead` will have their updates streamed. Opening a tunnel to an agent requires that the user can perform `ActionSSH` against the workspace containing it.	2024-11-01 14:53:53 +11:00
Spike Curtis	2df9a3e554	fix: fix tailnet remoteCoordination to wait for server (#14666 ) Fixes #12560 When gracefully disconnecting from the coordinator, we would send the Disconnect message and then close the dRPC stream. However, closing the dRPC stream can cause the server not to process the Disconnect message, since we use the stream context in a `select` while sending it to the coordinator. This is a product bug uncovered by the flake, and probably results in us failing graceful disconnect some minority of the time. Instead, the `remoteCoordination` (and `inMemoryCoordination` for consistency) should send the Disconnect message and then wait for the coordinator to hang up (on some graceful disconnect timer, in the form of a context).	2024-09-16 09:24:30 +04:00
Spike Curtis	d6154c4310	chore: remove tailnet v1 API support (#14641 ) Drops support for v1 of the tailnet API, which was the original coordination protocol where we only sent node updates, never marked them lost or disconnected. v2 of the tailnet API went GA for CLI clients in Coder 2.8.0, so clients older than that would stop working.	2024-09-12 07:56:31 +04:00
Spike Curtis	7b39f6b0d4	fix: improves coordination logging (#14556 )	2024-09-04 15:10:43 +04:00
Colin Adler	777dfbe965	feat(enterprise): add ready for handshake support to pgcoord (#12935 )	2024-04-16 15:01:10 -05:00
Colin Adler	e801e878ba	feat: add agent acks to in-memory coordinator (#12786 ) When an agent receives a node, it responds with an ACK which is relayed to the client. After the client receives the ACK, it's allowed to begin pinging.	2024-04-10 17:15:33 -05:00
Colin Adler	e5d911462f	fix(tailnet): enforce valid agent and client addresses (#12197 ) This adds the ability for `TunnelAuth` to also authorize incoming wireguard node IPs, preventing agents from reporting anything other than their static IP generated from the agent ID.	2024-03-01 09:02:33 -06:00
Spike Curtis	af3fdc68c3	chore: refactor agent routines that use the v2 API (#12223 ) In anticipation of needing the `LogSender` to run on a context that doesn't get immediately canceled when you `Close()` the agent, I've undertaken a little refactor to manage the goroutines that get run against the Tailnet and Agent API connection. This handles controlling two contexts, one that gets canceled right away at the start of graceful shutdown, and another that stays up to allow graceful shutdown to complete.	2024-02-23 11:04:23 +04:00
Spike Curtis	e5ba586e30	fix: fix graceful disconnect in DialWorkspaceAgent (#11993 ) I noticed in testing that the CLI wasn't correctly sending the disconnect message when it shuts down, and thus agents are seeing this as a "lost" peer, rather than a "disconnected" one. What was happening is that we just used a single context for everything from the netconn to the RPCs, and when the context was canceled we failed to send the disconnect message due to canceled context. So, this PR splits things into two contexts, with a graceful one set to last up to 1 second longer than the main one.	2024-02-05 14:01:37 +04:00
Spike Curtis	520b12e1a2	fix: close MultiAgentConn when coordinator closes (#11941 ) Fixes an issue where a MultiAgentConn isn't closed properly when the coordinator it is connected to is closed. Since servertailnet checks whether the conn is closed before reinitializing, it is important that we check this, otherwise servertailnet can get stuck if the coordinator closes (e.g. when we switch from AGPL to PGCoordinator after decoding a license).	2024-01-31 00:38:19 +04:00
Spike Curtis	1e8a9c09fe	chore: remove legacy wsconncache (#11816 ) Fixes #8218 Removes `wsconncache` and related "is legacy?" functions and API calls that were used by it. The only leftover is that Agents still use the legacy IP, so that back level clients or workspace proxies can dial them correctly. We should eventually remove this: #11819	2024-01-30 07:56:36 +04:00
Spike Curtis	3d85cdfa11	feat: set peers lost when disconnected from coordinator (#11681 ) Adds support to Coordination to call SetAllPeersLost() when it is closed. This ensure that when we disconnect from a Coordinator, we set all peers lost. This covers CoderSDK (CLI client) and Agent. Next PR will cover MultiAgent (notably, `wsproxy`).	2024-01-22 15:26:20 +04:00
Spike Curtis	f01cab9894	feat: use tailnet v2 API for coordination (#11638 ) This one is huge, and I'm sorry. The problem is that once I change `tailnet.Conn` to start doing v2 behavior, I kind of have to change it everywhere, including in CoderSDK (CLI), the agent, wsproxy, and ServerTailnet. There is still a bit more cleanup to do, and I need to add code so that when we lose connection to the Coordinator, we mark all peers as LOST, but that will be in a separate PR since this is big enough!	2024-01-22 11:07:50 +04:00
Dean Sheather	e46431078c	feat: add AgentAPI using DRPC (#10811 ) Co-authored-by: Spike Curtis <spike@coder.com>	2023-12-18 22:53:28 +10:00
Spike Curtis	ad3fed72bc	chore: rename Coordinator to CoordinatorV1 (#11222 ) Renames the tailnet.Coordinator to represent both v1 and v2 APIs, so that we can use this interface for the main atomic pointer. Part of #10532	2023-12-15 11:38:12 +04:00
Spike Curtis	bf3b35b1e2	fix: stop logging context Canceled as error (#11177 ) fixes #11166 and a related log that could have the same problem	2023-12-13 13:08:30 +04:00
Cian Johnston	197cd935cf	chore(Makefile): use linter version from dogfood Dockerfile (#11147 ) * chore(Makefile): use golangci-lint version from dogfood Dockerfile * chore(dogfood/Dockerfile): update golangci-lint to latest version * chore(coderd): address linter complaints	2023-12-12 10:02:32 +00:00
Spike Curtis	2c86d0bed0	feat: support v2 Tailnet API in AGPL coordinator (#11010 ) Fixes #10529	2023-12-06 15:04:28 +04:00
Spike Curtis	5c48cb4447	feat: modify PG Coordinator to work with new v2 Tailnet API (#10573 ) re: #10528 Refactors PG Coordinator to work with the Tailnet v2 API, including wrappers for the existing v1 API. The debug endpoint functions, but doesn't return sensible data, that will be in another stacked PR.	2023-11-20 14:31:04 +04:00
Colin Adler	c900b5f8df	feat: add single tailnet support to pgcoord (#9351 )	2023-09-21 14:30:48 -05:00
Kyle Carberry	22e781eced	chore: add /v2 to import module path (#9072 ) * chore: add /v2 to import module path go mod requires semantic versioning with versions greater than 1.x This was a mechanical update by running: ``` go install github.com/marwan-at-work/mod/cmd/mod@latest mod upgrade ``` Migrate generated files to import /v2 * Fix gen	2023-08-18 18:55:43 +00:00
Colin Adler	bc862fa493	chore: upgrade tailscale to v1.46.1 (#8913 )	2023-08-09 19:50:26 +00:00
Colin Adler	0b4f333a6f	chore: add http debug support to pgcoord (#8795 )	2023-07-28 17:59:31 -05:00
Colin Adler	dd2f79995b	chore(tailnet): rewrite coordinator debug using `html/template` (#8752 )	2023-07-26 22:54:21 +00:00
Colin Adler	6b92abebb9	fix(tailnet): track agent names for http debug (#8744 )	2023-07-26 18:44:10 +00:00
Colin Adler	517fb19474	feat: add single tailnet support to moons (#8587 )	2023-07-19 11:11:11 -05:00
Colin Adler	c47b78c44b	chore: replace wsconncache with a single tailnet (#8176 )	2023-07-12 17:37:31 -05:00
Spike Curtis	c0a01ec81c	fix: fix TestPGCoordinatorDual_Mainline flake (#8228 ) * fix TestPGCoordinatorDual_Mainline flake Signed-off-by: Spike Curtis <spike@coder.com> * use slices.Contains instead of local function Signed-off-by: Spike Curtis <spike@coder.com> --------- Signed-off-by: Spike Curtis <spike@coder.com>	2023-06-28 11:37:45 +04:00
Spike Curtis	cc17d2feea	refactor: add postgres tailnet coordinator (#8044 ) * postgres tailnet coordinator Signed-off-by: Spike Curtis <spike@coder.com> * Fix db migration; tests Signed-off-by: Spike Curtis <spike@coder.com> * Add fixture, regenerate Signed-off-by: Spike Curtis <spike@coder.com> * Fix fixtures Signed-off-by: Spike Curtis <spike@coder.com> * review comments, run clean gen Signed-off-by: Spike Curtis <spike@coder.com> * Rename waitForConn -> cleanupConn Signed-off-by: Spike Curtis <spike@coder.com> * code review updates Signed-off-by: Spike Curtis <spike@coder.com> * db migration order Signed-off-by: Spike Curtis <spike@coder.com> * fix log field name last_heartbeat Signed-off-by: Spike Curtis <spike@coder.com> * fix heartbeat_from log field Signed-off-by: Spike Curtis <spike@coder.com> * fix slog fields for linting Signed-off-by: Spike Curtis <spike@coder.com> --------- Signed-off-by: Spike Curtis <spike@coder.com>	2023-06-21 16:20:58 +04:00
Marcin Tojek	b1d1b63113	chore: ensure logs consistency across Coder (#8083 )	2023-06-20 12:30:45 +02:00
Spike Curtis	dc3d39baf8	fix: agent disconnects from coordinator (#7430 ) * work around websocket deadline bug Signed-off-by: Spike Curtis <spike@coder.com> * Use test context to hold websocket open Signed-off-by: Spike Curtis <spike@coder.com> * Fix race creating test websocket Signed-off-by: Spike Curtis <spike@coder.com> * set write deadline to time.Time zero Signed-off-by: Spike Curtis <spike@coder.com> --------- Signed-off-by: Spike Curtis <spike@coder.com>	2023-05-05 20:29:03 +04:00
Spike Curtis	bd630113b2	fix: coordinator node update race (#7345 ) * fix: coordinator node update race Signed-off-by: Spike Curtis <spike@coder.com> * Lint fixes, make core private Signed-off-by: Spike Curtis <spike@coder.com> * Don't log broken connections as errors Signed-off-by: Spike Curtis <spike@coder.com> --------- Signed-off-by: Spike Curtis <spike@coder.com>	2023-05-02 20:58:21 +04:00
Spike Curtis	b6666cf1cf	chore: tailnet debug logging (#7260 ) * Enable discovery (disco) debug Signed-off-by: Spike Curtis <spike@coder.com> * Better debug on reconnectingPTY Signed-off-by: Spike Curtis <spike@coder.com> * Agent logging in appstest Signed-off-by: Spike Curtis <spike@coder.com> * More reconnectingPTY logging Signed-off-by: Spike Curtis <spike@coder.com> * Add logging to coordinator Signed-off-by: Spike Curtis <spike@coder.com> * Update agent/agent.go Co-authored-by: Mathias Fredriksson <mafredri@gmail.com> * Update agent/agent.go Co-authored-by: Mathias Fredriksson <mafredri@gmail.com> * Update agent/agent.go Co-authored-by: Mathias Fredriksson <mafredri@gmail.com> * Update agent/agent.go Co-authored-by: Mathias Fredriksson <mafredri@gmail.com> * Clarify logs; remove unrelated changes Signed-off-by: Spike Curtis <spike@coder.com> --------- Signed-off-by: Spike Curtis <spike@coder.com> Co-authored-by: Mathias Fredriksson <mafredri@gmail.com>	2023-04-27 13:59:01 +04:00
Kyle Carberry	1724cbf872	feat: automatically use websockets if DERP upgrade is unavailable (#6381 ) * feat: automatically use websockets if DERP upgrade is unavailable This might be our biggest hangup for deployments at the moment... Load balancers by default do not support the DERP protocol, so many of our prospects and customers run into failing workspace connections. This automatically swaps to use WebSockets, and reports the reason to coderd. In a future contribution, a warning will appear by the agent if it was forced to use WebSockets instead of DERP. * Fix nil pointer type in Tailscale dep * Fix requested changes	2023-03-01 22:18:14 +00:00
Kyle Carberry	026b1cd2a4	chore: update to go 1.20 (#5968 ) Co-authored-by: Colin Adler <colin1adler@gmail.com>	2023-02-02 12:36:27 -06:00
Colin Adler	cc694a55bc	feat: add debug info to HA coordinator (#5883 )	2023-01-26 16:32:38 -06:00
Colin Adler	dd8eab5675	fix: cache disconnected agent names in tailnet coordinator debug (#5870 )	2023-01-25 21:23:14 -06:00
Colin Adler	233492b75d	fix: ensure coordinator debug output is always sorted (#5867 )	2023-01-26 00:29:51 +00:00
Colin Adler	1cd5f38cb0	feat: add debug server for tailnet coordinators (#5861 ) Implements a Tailscale-like debug server for our in-memory coordinator. This should provide some visibility into why connections could be failing. Resolves: https://github.com/coder/coder/issues/5845 ![image](https://user-images.githubusercontent.com/6332295/214680832-2724d633-2d54-44d6-a7ce-5841e5824ee5.png)	2023-01-25 21:27:36 +00:00
Colin Adler	c3731a1be0	fix: ensure agent websocket only removes its own conn (#5828 )	2023-01-23 17:22:34 -06:00
Kyle Carberry	7851fb1c99	Fix unlock of unlocked mutex in tailnet coordinator	2022-10-17 23:51:41 +00:00
Kyle Carberry	c5afaffa7e	fix: Tidy up closes for nicer output (#4605 ) * fix: Tidy up closes for nicer output There was a context canceled message that would appear because of traces, and this was using the wrong close. I don't think it was causing any specific problems, but it could make a replica warning appear on restart. * Fix migration and experimental	2022-10-17 18:36:23 -05:00
Jon Ayers	9b4ab82044	fix: potential deadlock in coordinator (#4598 )	2022-10-17 20:46:19 +00:00
Colin Adler	9b5d627a55	fix(tailnet): data race in `coordinator.Close()` (#4589 )	2022-10-17 11:47:45 -05:00
Kyle Carberry	2ba4a62a0d	feat: Add high availability for multiple replicas (#4555 ) * feat: HA tailnet coordinator * fixup! feat: HA tailnet coordinator * fixup! feat: HA tailnet coordinator * remove printlns * close all connections on coordinator * impelement high availability feature * fixup! impelement high availability feature * fixup! impelement high availability feature * fixup! impelement high availability feature * fixup! impelement high availability feature * Add replicas * Add DERP meshing to arbitrary addresses * Move packages to highavailability folder * Move coordinator to high availability package * Add flags for HA * Rename to replicasync * Denest packages for replicas * Add test for multiple replicas * Fix coordination test * Add HA to the helm chart * Rename function pointer * Add warnings for HA * Add the ability to block endpoints * Add flag to disable P2P connections * Wow, I made the tests pass * Add replicas endpoint * Ensure close kills replica * Update sql * Add database latency to high availability * Pipe TLS to DERP mesh * Fix DERP mesh with TLS * Add tests for TLS * Fix replica sync TLS * Fix RootCA for replica meshing * Remove ID from replicasync * Fix getting certificates for meshing * Remove excessive locking * Fix linting * Store mesh key in the database * Fix replica key for tests * Fix types gen * Fix unlocking unlocked * Fix race in tests * Update enterprise/derpmesh/derpmesh.go Co-authored-by: Colin Adler <colin1adler@gmail.com> * Rename to syncReplicas * Reuse http client * Delete old replicas on a CRON * Fix race condition in connection tests * Fix linting * Fix nil type * Move pubsub to in-memory for twenty test * Add comment for configuration tweaking * Fix leak with transport * Fix close leak in derpmesh * Fix race when creating server * Remove handler update * Skip test on Windows * Fix DERP mesh test * Wrap HTTP handler replacement in mutex * Fix error message for relay * Fix API handler for normal tests * Fix speedtest * Fix replica resend * Fix derpmesh send * Ping async * Increase wait time of template version jobd * Fix race when closing replica sync * Add name to client * Log the derpmap being used * Don't connect if DERP is empty * Improve agent coordinator logging * Fix lock in coordinator * Fix relay addr * Fix race when updating durations * Fix client publish race * Run pubsub loop in a queue * Store agent nodes in order * Fix coordinator locking * Check for closed pipe Co-authored-by: Colin Adler <colin1adler@gmail.com>	2022-10-17 13:43:30 +00:00
Kyle Carberry	6c83012082	chore: Add comments to indicate what each field on a network node means (#4241 ) * chore: Add comments to indicate what each field on a network node means * Update tailnet/coordinator.go Co-authored-by: Colin Adler <colin1adler@gmail.com> * Update tailnet/coordinator.go Co-authored-by: Colin Adler <colin1adler@gmail.com> * Update tailnet/coordinator.go Co-authored-by: Colin Adler <colin1adler@gmail.com> Co-authored-by: Colin Adler <colin1adler@gmail.com>	2022-09-28 16:04:10 +00:00
Kyle Carberry	b8ec5c786d	fix: Ensure tailnet coordinations are sent orderly (#4198 )	2022-09-26 10:16:04 -05:00
Kyle Carberry	6826b976d7	fix: Add latency-check for DERP over HTTP(s) (#3788 ) * fix: Add latency-check for DERP over HTTP(s) This fixes scenarios where latency wasn't being reported if a connection had UDP entirely blocked. * Add inactivity ping * Improve coordinator error reporting consistency	2022-09-01 16:41:47 +00:00

1 2

51 Commits