coder

mirror of https://github.com/coder/coder.git synced 2025-07-09 11:45:56 +00:00

Author	SHA1	Message	Date
Jon Ayers	ce573b9faa	fix: add agent exec abstraction (#15717 )	2024-12-04 23:30:25 +02:00
Jon Ayers	1f238fed59	feat: integrate new agentexec pkg (#15609 ) - Integrates the `agentexec` pkg into the agent and removes the legacy system of iterating over the process tree. It adds some linting rules to hopefully catch future improper uses of `exec.Command` in the package.	2024-11-27 20:12:15 +02:00
Spike Curtis	103824f726	fix: fix panic while tearing down reconnecting PTY (#15615 ) fixes https://github.com/coder/internal/issues/221 Fixes an issue where two goroutines were sharing the `err` variable, leading to a data race where we'd fail to process the error and then nil-pointer panic. I ended up refactoring reconnecting PTY stuff into the `reconnectingpty` package, instead of having it on the agent. That `createTailnet` routine had waaay too many deeply nested goroutines, which is I'm sure a big contributor to the bug appearing in the first place.	2024-11-22 09:46:25 +04:00
Spike Curtis	40802958e9	fix: use explicit api versions for agent and tailnet (#15508 ) Bumps the Tailnet and Agent API version 2.3, and creates some extra controls and machinery around these versions. What happened is that we accidentally shipped two new API features without bumping the version. `ScriptCompleted` on the Agent API in Coder v2.16 and `RefreshResumeToken` on the Tailnet API in Coder v2.15. Since we can't easily retroactively bump the versions, we'll roll these changes into API version 2.3 along with the new WorkspaceUpdates RPC, which hasn't been released yet. That means there is some ambiguity in Coder v2.15-v2.17 about exactly what methods are supported on the Tailnet and Agent APIs. This isn't great, but hasn't caused us major issues because 1. RefreshResumeToken is considered optional, and clients just log and move on if the RPC isn't supported. 2. Agents basically never get started talking to a Coderd that is older than they are, since the agent binary is normally downloaded from Coderd at workspace start. Still it's good to get things squared away in terms of versions for SDK users and possible edge cases around client and server versions. To mitigate against this thing happening again, this PR also: 1. adds a CODEOWNERS for the API proto packages, so I'll review changes 2. defines interface types for different API versions, and has the agent explicitly use a specific version. That way, if you add a new method, and try to use it in the agent without thinking explicitly about versions, it won't compile. With the protocol controllers stuff, we've sort of already abstracted the Tailnet API such that the interface type strategy won't work, but I'll work on getting the Controller to be version aware, such that it can check the API version it's getting against the controllers it has -- in a later PR.	2024-11-15 11:16:28 +04:00
Spike Curtis	886dcbec84	chore: refactor coordination (#15343 ) Refactors the way clients of the Tailnet API (clients of the API, which include both workspace "agents" and "clients") interact with the API. Introduces the idea of abstract "controllers" for each of the RPCs in the API, and implements a Coordination controller by refactoring from `workspacesdk`. chore re: #14729	2024-11-05 13:50:10 +04:00
Ethan	c5a4095610	fix: include custom agent headers in tailnet to support DERP connections (#15145 ) Fixes #15131.	2024-10-21 20:59:21 +11:00
Jon Ayers	7da231bc92	fix: fix error handling to prevent spam in proc prio management (#15071 )	2024-10-15 02:17:10 +00:00
Spike Curtis	8785a51b09	feat: include Coder service prefix on agents (#14944 ) fixes #14715 Configures agents to use an address both in the Tailscale service prefix and the new Coder service prefix. Also modifies the Coordinator auth to allow the new prefix. Updates `coder/tailscale` to include https://github.com/coder/tailscale/pull/62 which fixes a bug around forwarding TCP connections to localhost. This functionality is tested in the modifications to `TestAgent_Dial`.	2024-10-04 10:16:33 +04:00
Spike Curtis	7d9f5ab81d	chore: add Coder service prefix to tailnet (#14943 ) re: #14715 This PR introduces the Coder service prefix: `fd60:627a:a42b::/48` and refactors our existing code as calling the Tailscale service prefix explicitly (rather than implicitly). Removes the unused `Addresses` agent option. All clients today assume they can compute the Agent's IP address based on its UUID, so an agent started with a custom address would break things.	2024-10-04 10:04:10 +04:00
Danielle Maywood	ae522c558d	feat: add agent timings (#14713 ) * feat: begin impl of agent script timings * feat: add job_id and display_name to script timings * fix: increment migration number * fix: rename migrations from 251 to 254 * test: get tests compiling * fix: appease the linter * fix: get tests passing again * fix: drop column from correct table * test: add fixture for agent script timings * fix: typo * fix: use job id used in provisioner job timings * fix: increment migration number * test: behaviour of script runner * test: rewrite test * test: does exit 1 script break things? * test: rewrite test again * fix: revert change Not sure how this came to be, I do not recall manually changing these files. * fix: let code breathe * fix: wrap errors * fix: justify nolint * fix: swap require.Equal argument order * fix: add mutex operations * feat: add 'ran_on_start' and 'blocked_login' fields * fix: update testdata fixture * fix: refer to agent_id instead of job_id in timings * fix: JobID -> AgentID in dbauthz_test * fix: add 'id' to scripts, make timing refer to script id * fix: fix broken tests and convert bug * fix: update testdata fixtures * fix: update testdata fixtures again * feat: capture stage and if script timed out * fix: update migration number * test: add test for script api * fix: fake db query * fix: use UTC time * fix: ensure r.scriptComplete is not nil * fix: move err check to right after call * fix: uppercase sql * fix: use dbtime.Now() * fix: debug log on r.scriptCompleted being nil * fix: ensure correct rbac permissions * chore: remove DisplayName * fix: get tests passing * fix: remove space in sql up * docs: document ExecuteOption * fix: drop 'RETURNING' from sql * chore: remove 'display_name' from timing table * fix: testdata fixture * fix: put r.scriptCompleted call in goroutine * fix: track goroutine for test + use separate context for reporting * fix: appease linter, handle trackCommandGoroutine error * fix: resolve race condition * feat: replace timed_out column with status column * test: update testdata fixture * fix: apply suggestions from review * revert: linter changes	2024-09-24 10:51:49 +01:00
Spike Curtis	2df9a3e554	fix: fix tailnet remoteCoordination to wait for server (#14666 ) Fixes #12560 When gracefully disconnecting from the coordinator, we would send the Disconnect message and then close the dRPC stream. However, closing the dRPC stream can cause the server not to process the Disconnect message, since we use the stream context in a `select` while sending it to the coordinator. This is a product bug uncovered by the flake, and probably results in us failing graceful disconnect some minority of the time. Instead, the `remoteCoordination` (and `inMemoryCoordination` for consistency) should send the Disconnect message and then wait for the coordinator to hang up (on some graceful disconnect timer, in the form of a context).	2024-09-16 09:24:30 +04:00
Jon Ayers	bfdc29f466	fix: suppress benign errors when listing processes (#14660 )	2024-09-12 23:00:04 +01:00
Jon Ayers	9f4972901c	fix: avoid logging no such process errors for process priority (#14655 )	2024-09-12 18:00:42 +00:00
Spike Curtis	fb3523b37f	chore: remove legacy AgentIP address (#14640 ) Removes the support for the Agent's "legacy IP" which was a hardcoded IP address all agents used to use, before we introduced "single tailnet". Single tailnet went GA in 2.7.0.	2024-09-12 07:40:19 +04:00
Ethan	c8580a415a	feat: expose current agent connections by type via prometheus (#14612 )	2024-09-11 14:13:30 +10:00
Danielle Maywood	839918c5e7	chore(docs): document agent api debug endpoints (#14454 ) * chore(docs): add agent api debug docs * chore(docs): add sections to agent api readme * chore(docs): link debug manifest to agentsdk.Manifest schema * chore(docs): add high level overview of agent api debug docs * chore(docs): link to agent api docs from reference * chore(docs): fix invalid paths * chore(docs): use env variable for coder agent debug address	2024-08-28 09:47:14 +01:00
Mathias Fredriksson	8c0565177e	chore(agent): remove `err=<nil>` log for batch update metadata complete (#14179 ) Co-authored-by: Steven Masley <Emyrk@users.noreply.github.com>	2024-08-07 11:31:47 +00:00
Marcin Tojek	e96652ebbc	feat: block file transfers for security (#13501 )	2024-06-10 12:12:23 +00:00
Kayla Washburn-Love	b248f125e1	chore: rename notification banners to announcement banners (#13419 )	2024-05-31 10:59:28 -06:00
Kayla Washburn-Love	d8e0be6ee6	feat: add support for multiple banners (#13081 )	2024-05-08 15:40:43 -06:00
Spike Curtis	d51c6912a7	fix: make handleManifest always signal dependents (#13141 ) Fixes #13139 Using a bare channel to signal dependent goroutines means that we can only signal success, not failure, which leads to deadlock if we fail in a way that doesn't cause the whole `apiConnRoutineManager` to tear down routines. Instead, we use a new object called a `checkpoint` that signals success or failure, so that dependent routines get unblocked if the routine they depend on fails.	2024-05-06 14:47:41 +04:00
Spike Curtis	2efb46a10e	chore: remove superfluous context.Canceled handling (#13140 ) Removes a check for `context.Canceled` inside the `handleManifest` routine. This checking is handled in the `apiConnRoutineManager`, so checking inside the handler is redundant.	2024-05-06 14:33:16 +04:00
Cian Johnston	99dda4a43a	fix(agent): keep track of lastReportIndex between invocations of reportLifecycle() (#13075 )	2024-04-25 16:54:51 +01:00
Jon Ayers	426e9f2b96	feat: support adjusting child proc oom scores (#12655 )	2024-04-03 09:42:03 -05:00
Colin Adler	4d5a7b2d56	chore(codersdk): move all tailscale imports out of `codersdk` (#12735 ) Currently, importing `codersdk` just to interact with the API requires importing tailscale, which causes builds to fail unless manually using our fork.	2024-03-26 12:44:31 -05:00
Cian Johnston	b0c4e7504c	feat(support): add client magicsock and agent prometheus metrics to support bundle (#12604 ) * feat(codersdk): add ability to fetch prometheus metrics directly from agent * feat(support): add client magicsock and agent prometheus metrics to support bundle * refactor(support): simplify AgentInfo control flow Co-authored-by: Mathias Fredriksson <mafredri@gmail.com>	2024-03-15 15:33:49 +00:00
Cian Johnston	653ddccd8e	fix(agent): remove unused token debug handler (#12602 )	2024-03-15 09:43:36 +00:00
Cian Johnston	63696d762f	feat(codersdk): add debug handlers for logs, manifest, and token to agent (#12593 ) * feat(codersdk): add debug handlers for logs, manifest, and token to agent * add more logging * use io.LimitReader instead of seeking	2024-03-14 15:36:12 +00:00
Cian Johnston	3b406878e0	feat(agent): expose HTTP debug server over tailnet API (#12582 )	2024-03-14 10:02:01 +00:00
Spike Curtis	b0afffbafb	feat: use v2 API for agent metadata updates (#12281 ) Switches the agent to report metadata over the v2 API. Fixes #10534	2024-02-26 09:50:19 +04:00
Spike Curtis	aa7a9f5cc4	feat: use v2 API for agent lifecycle updates (#12278 ) Agent uses the v2 API to post lifecycle updates. Part of #10534	2024-02-23 15:24:28 +04:00
Spike Curtis	4cc132cea0	feat: switch agent to use v2 API for sending logs (#12068 ) Changes the agent to use the new v2 API for sending logs, via the logSender component. We keep the PatchLogs function around, but deprecate it so that we can test the v1 endpoint.	2024-02-23 11:27:15 +04:00
Spike Curtis	af3fdc68c3	chore: refactor agent routines that use the v2 API (#12223 ) In anticipation of needing the `LogSender` to run on a context that doesn't get immediately canceled when you `Close()` the agent, I've undertaken a little refactor to manage the goroutines that get run against the Tailnet and Agent API connection. This handles controlling two contexts, one that gets canceled right away at the start of graceful shutdown, and another that stays up to allow graceful shutdown to complete.	2024-02-23 11:04:23 +04:00
Mathias Fredriksson	b1c0b39d88	feat(agent): add script data dir for binaries and files (#12205 ) The agent is extended with a `--script-data-dir` flag, defaulting to the OS temp dir. This dir is used for storing `coder-script-data/bin` and `coder-script/[script uuid]`. The former is a place for all scripts to place executable binaries that will be available by other scripts, SSH sessions, etc. The latter is a place for the script to store files. Since we default to OS temp dir, files are ephemeral by default. In the future, we may consider adding new env vars or changing the default storage location. Workspace startup speed could potentially benefit from scripts being able to skip steps that require downloading software. We may also extend this with more env variables (e.g. persistent storage in HOME). Fixes #11131	2024-02-20 13:26:18 +02:00
Mathias Fredriksson	c63f569174	refactor(agent/agentssh): move envs to agent and add agentssh config struct (#12204 ) This commit refactors where custom environment variables are set in the workspace and decouples agent specific configs from the `agentssh.Server`. To reproduce all functionality, `agentssh.Config` is introduced. The custom environment variables are now configured in `agent/agent.go` and the agent retains control of the final state. This will allow for easier extension in the future and keep other modules decoupled.	2024-02-19 16:30:00 +02:00
Spike Curtis	1cf4b62867	feat: change agent to use v2 API for reporting stats (#12024 ) Modifies the agent to use the v2 API to report its statistics, using the `statsReporter` subcomponent.	2024-02-07 15:26:41 +04:00
Spike Curtis	1aa117b9ec	chore: rename client Listen to ConnectRPC (#11916 ) ConnectRPC seems more appropriate for this function	2024-02-01 14:44:11 +04:00
Spike Curtis	0fc177203e	feat: use agent v2 API to update app health (#11889 ) Use the Agent v2 API to update App Health	2024-01-30 11:35:12 +04:00
Spike Curtis	2599850e54	feat: use agent v2 API to post startup (#11877 ) Uses the v2 Agent API to post startup information.	2024-01-30 11:23:28 +04:00
Spike Curtis	da8bb1c198	feat: use agent v2 API to fetch manifest (#11832 ) Agent uses the v2 API to obtain the manifest, instead of the HTTP API.	2024-01-30 10:11:28 +04:00
Spike Curtis	0eff646c31	chore: move proto to sdk conversion to agentsdk (#11831 ) `agentsdk` depends on `agent/proto` because it needs to get the version to dial. Therefore, the conversion routines need to live in `agentsdk` so that we can convert to and from the Manifest. I briefly considered refactoring the agent to only reference `proto.Manifest`, but decided against it because we might have multiple protocol versions in the future, its useful to have a protocol-independent data structure.	2024-01-30 09:04:56 +04:00
Spike Curtis	13e24f21e4	feat: use Agent v2 API for Service Banner (#11806 ) Agent uses the v2 API for the service banner, rather than the v1 HTTP API. One of several for #10534	2024-01-30 07:44:47 +04:00
Spike Curtis	059e533544	feat: agent uses Tailnet v2 API for DERPMap updates (#11698 ) Switches the Agent to use Tailnet v2 API to get DERPMap updates. Subsequent PRs will do the same for the CLI (`codersdk`) and `wsproxy`.	2024-01-23 14:42:07 +04:00
Spike Curtis	f01cab9894	feat: use tailnet v2 API for coordination (#11638 ) This one is huge, and I'm sorry. The problem is that once I change `tailnet.Conn` to start doing v2 behavior, I kind of have to change it everywhere, including in CoderSDK (CLI), the agent, wsproxy, and ServerTailnet. There is still a bit more cleanup to do, and I need to add code so that when we lose connection to the Coordinator, we mark all peers as LOST, but that will be in a separate PR since this is big enough!	2024-01-22 11:07:50 +04:00
Spike Curtis	4071f1713b	feat: add logging to agent stats and JetBrains tracking (#11364 ) Adds logging so we can hope to diagnose #11363	2024-01-02 13:34:49 +04:00
Steven Masley	b7bdb17460	feat: add metrics to workspace agent scripts (#11132 ) * push startup script metrics to agent	2023-12-13 11:45:43 -06:00
Dean Sheather	a9c0c01629	chore: fix flake in listening ports test (#10833 )	2023-11-22 09:30:51 +00:00
Mathias Fredriksson	7fecd39e23	fix(agent/agentscripts): display informative error for ErrWaitDelay (#10407 ) Fixes #10400	2023-10-27 19:07:26 +03:00
Mathias Fredriksson	1a2aea3a6b	fix(agent): prevent metadata from being discarded if report is slow (#10386 )	2023-10-23 17:02:54 +00:00
Mathias Fredriksson	76c65b1e1b	fix(agent): send metadata in batches (#10225 ) Fixes #9782 --- I recommend reviewing with ignore whitespace.	2023-10-13 17:48:25 +03:00

1 2 3 4

200 Commits