Commit Graph

166 Commits

Author SHA1 Message Date
7d4b3c8634 feat(agent): add devcontainer autostart support (#17076)
This change adds support for devcontainer autostart in workspaces. The
preconditions for utilizing this feature are:

1. The `coder_devcontainer` resource must be defined in Terraform
2. By the time the startup scripts have completed,
	- The `@devcontainers/cli` tool must be installed
	- The given workspace folder must contain a devcontainer configuration

Example Terraform:

```tf
resource "coder_devcontainer" "coder" {
  agent_id         = coder_agent.main.id
  workspace_folder = "/home/coder/coder"
  config_path      = ".devcontainer/devcontainer.json" # (optional)
}
```

Closes #16423
2025-03-27 12:31:30 +02:00
3005cb4594 feat(agent): set additional login vars, LOGNAME and SHELL (#16874)
This change stes additional env vars. This is useful for programs that
assume their presence (for instance, Zed remote relies on SHELL).

See `man login`.
2025-03-11 10:18:57 +00:00
dfcd93b26e feat: enable agent connection reports by default, remove flag (#16778)
This change enables agent connection reports by default and removes the
experimental flag for enabling them.

Updates #15139
2025-03-03 18:37:28 +02:00
04c33968cf refactor: replace golang.org/x/exp/slices with slices (#16772)
The experimental functions in `golang.org/x/exp/slices` are now
available in the standard library since Go 1.21.

Reference: https://go.dev/doc/go1.21#slices

Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
2025-03-04 00:46:49 +11:00
d0e2060692 feat(agent): add second SSH listener on port 22 (#16627)
Some checks are pending
ci / changes (push) Waiting to run
ci / lint (push) Blocked by required conditions
ci / gen (push) Waiting to run
ci / fmt (push) Blocked by required conditions
ci / test-go (macos-latest) (push) Blocked by required conditions
ci / test-go (ubuntu-latest) (push) Blocked by required conditions
ci / test-go (windows-2022) (push) Blocked by required conditions
ci / test-cli (macos-latest) (push) Blocked by required conditions
ci / test-cli (windows-2022) (push) Blocked by required conditions
ci / test-go-pg (ubuntu-latest) (push) Blocked by required conditions
ci / test-go-pg-16 (push) Blocked by required conditions
ci / test-go-race (push) Blocked by required conditions
ci / test-go-race-pg (push) Blocked by required conditions
ci / test-go-tailnet-integration (push) Blocked by required conditions
ci / test-js (push) Blocked by required conditions
ci / test-e2e (push) Blocked by required conditions
ci / test-e2e-premium (push) Blocked by required conditions
ci / chromatic (push) Blocked by required conditions
ci / offlinedocs (push) Blocked by required conditions
ci / required (push) Blocked by required conditions
ci / build-dylib (push) Blocked by required conditions
ci / build (push) Blocked by required conditions
ci / deploy (push) Blocked by required conditions
ci / deploy-wsproxies (push) Blocked by required conditions
ci / sqlc-vet (push) Blocked by required conditions
ci / notify-slack-on-failure (push) Blocked by required conditions
OpenSSF Scorecard / Scorecard analysis (push) Waiting to run
Fixes: https://github.com/coder/internal/issues/377

Added an additional SSH listener on port 22, so the agent now listens on both, port one and port 22.

---
Change-Id: Ifd986b260f8ac317e37d65111cd4e0bd1dc38af8
Signed-off-by: Thomas Kosiewski <tk@coder.com>
2025-03-03 04:47:42 +01:00
ec44f06f5c feat(cli): allow SSH command to connect to running container (#16726)
Fixes https://github.com/coder/coder/issues/16709 and
https://github.com/coder/coder/issues/16420

Adds the capability to`coder ssh` into a running container if `CODER_AGENT_DEVCONTAINERS_ENABLE=true`.

Notes:
* SFTP is currently not supported
* Haven't tested X11 container forwarding
* Haven't tested agent forwarding
2025-02-28 09:38:45 +00:00
4ba5a8a2ba feat(agent): add connection reporting for SSH and reconnecting PTY (#16652)
Updates #15139
2025-02-27 10:45:45 +00:00
172e52317c feat(agent): wire up agentssh server to allow exec into container (#16638)
Builds on top of https://github.com/coder/coder/pull/16623/ and wires up
the ReconnectingPTY server. This does nothing to wire up the web
terminal yet but the added test demonstrates the functionality working.

Other changes:
* Refactors and moves the `SystemEnvInfo` interface to the
`agent/usershell` package to address follow-up from
https://github.com/coder/coder/pull/16623#discussion_r1967580249
* Marks `usershellinfo.Get` as deprecated. Consumers should use the
`EnvInfoer` interface instead.

---------

Co-authored-by: Mathias Fredriksson <mafredri@gmail.com>
Co-authored-by: Danny Kopping <danny@coder.com>
2025-02-26 09:03:27 +00:00
9f5ad23644 refactor(agent/agentssh): move parsing of magic session and create type (#16630)
This change refactors the parsing of MagicSessionEnvs in the agentssh
package and moves the logic to an earlier stage. Also intoduces enums
for MagicSessionType.

Refs #15139
2025-02-19 22:18:31 +02:00
9520da338e fix: conform to stricter printf usage in Go 1.24 (#16330) 2025-01-29 18:06:22 +02:00
c069563af1 test: fix use of t.Logf where t.Log would suffice (#16328) 2025-01-29 14:35:04 +00:00
7b88776403 chore(testutil): add testutil.GoleakOptions (#16070)
- Adds `testutil.GoleakOptions` and consolidates existing options to
this location
- Pre-emptively adds required ignore for this Dependabot PR to pass CI
https://github.com/coder/coder/pull/16066
2025-01-08 15:38:37 +00:00
1f238fed59 feat: integrate new agentexec pkg (#15609)
- Integrates the `agentexec` pkg into the agent and removes the
legacy system of iterating over the process tree. It adds some linting
rules to hopefully catch future improper uses of `exec.Command` in the package.
2024-11-27 20:12:15 +02:00
5861e516b9 chore: add standard test logger ignoring db canceled (#15556)
Refactors our use of `slogtest` to instantiate a "standard logger" across most of our tests.  This standard logger incorporates https://github.com/coder/slog/pull/217 to also ignore database query canceled errors by default, which are a source of low-severity flakes.

Any test that has set non-default `slogtest.Options` is left alone. In particular, `coderdtest` defaults to ignoring all errors. We might consider revisiting that decision now that we have better tools to target the really common flaky Error logs on shutdown.
2024-11-18 14:09:22 +04:00
e5661c2748 feat: add support for multiple tunnel destinations in tailnet (#15409)
Closes #14729

Expands the Coordination controller used by the CLI client to allow multiple tunnel destinations (agents).  Our current client uses just one, but this unifies the logic so that when we add Coder VPN, 1 is just a special case of "many."
2024-11-08 13:32:07 +04:00
8c00ebc6ee chore: refactor ServerTailnet to use tailnet.Controllers (#15408)
chore of #14729

Refactors the `ServerTailnet` to use `tailnet.Controller` so that we reuse logic around reconnection and handling control messages, instead of reimplementing.  This unifies our "client" use of the tailscale API across CLI, coderd, and wsproxy.
2024-11-08 13:18:56 +04:00
886dcbec84 chore: refactor coordination (#15343)
Refactors the way clients of the Tailnet API (clients of the API, which include both workspace "agents" and "clients") interact with the API.  Introduces the idea of abstract "controllers" for each of the RPCs in the API, and implements a Coordination controller by refactoring from `workspacesdk`.

chore re: #14729
2024-11-05 13:50:10 +04:00
8785a51b09 feat: include Coder service prefix on agents (#14944)
fixes #14715

Configures agents to use an address both in the Tailscale service prefix and the new Coder service prefix. Also modifies the Coordinator auth to allow the new prefix.

Updates `coder/tailscale` to include https://github.com/coder/tailscale/pull/62 which fixes a bug around forwarding TCP connections to localhost.  This functionality is tested in the modifications to `TestAgent_Dial`.
2024-10-04 10:16:33 +04:00
7d9f5ab81d chore: add Coder service prefix to tailnet (#14943)
re: #14715

This PR introduces the Coder service prefix: `fd60:627a:a42b::/48` and refactors our existing code as calling the Tailscale service prefix explicitly (rather than implicitly).

Removes the unused `Addresses` agent option. All clients today assume they can compute the Agent's IP address based on its UUID, so an agent started with a custom address would break things.
2024-10-04 10:04:10 +04:00
ae522c558d feat: add agent timings (#14713)
* feat: begin impl of agent script timings

* feat: add job_id and display_name to script timings

* fix: increment migration number

* fix: rename migrations from 251 to 254

* test: get tests compiling

* fix: appease the linter

* fix: get tests passing again

* fix: drop column from correct table

* test: add fixture for agent script timings

* fix: typo

* fix: use job id used in provisioner job timings

* fix: increment migration number

* test: behaviour of script runner

* test: rewrite test

* test: does exit 1 script break things?

* test: rewrite test again

* fix: revert change

Not sure how this came to be, I do not recall manually changing
these files.

* fix: let code breathe

* fix: wrap errors

* fix: justify nolint

* fix: swap require.Equal argument order

* fix: add mutex operations

* feat: add 'ran_on_start' and 'blocked_login' fields

* fix: update testdata fixture

* fix: refer to agent_id instead of job_id in timings

* fix: JobID -> AgentID in dbauthz_test

* fix: add 'id' to scripts, make timing refer to script id

* fix: fix broken tests and convert bug

* fix: update testdata fixtures

* fix: update testdata fixtures again

* feat: capture stage and if script timed out

* fix: update migration number

* test: add test for script api

* fix: fake db query

* fix: use UTC time

* fix: ensure r.scriptComplete is not nil

* fix: move err check to right after call

* fix: uppercase sql

* fix: use dbtime.Now()

* fix: debug log on r.scriptCompleted being nil

* fix: ensure correct rbac permissions

* chore: remove DisplayName

* fix: get tests passing

* fix: remove space in sql up

* docs: document ExecuteOption

* fix: drop 'RETURNING' from sql

* chore: remove 'display_name' from timing table

* fix: testdata fixture

* fix: put r.scriptCompleted call in goroutine

* fix: track goroutine for test + use separate context for reporting

* fix: appease linter, handle trackCommandGoroutine error

* fix: resolve race condition

* feat: replace timed_out column with status column

* test: update testdata fixture

* fix: apply suggestions from review

* revert: linter changes
2024-09-24 10:51:49 +01:00
2df9a3e554 fix: fix tailnet remoteCoordination to wait for server (#14666)
Fixes #12560

When gracefully disconnecting from the coordinator, we would send the Disconnect message and then close the dRPC stream.  However, closing the dRPC stream can cause the server not to process the Disconnect message, since we use the stream context in a `select` while sending it to the coordinator.

This is a product bug uncovered by the flake, and probably results in us failing graceful disconnect some minority of the time.

Instead, the `remoteCoordination` (and `inMemoryCoordination` for consistency) should send the Disconnect message and then wait for the coordinator to hang up (on some graceful disconnect timer, in the form of a context).
2024-09-16 09:24:30 +04:00
c8580a415a feat: expose current agent connections by type via prometheus (#14612) 2024-09-11 14:13:30 +10:00
e96652ebbc feat: block file transfers for security (#13501) 2024-06-10 12:12:23 +00:00
b248f125e1 chore: rename notification banners to announcement banners (#13419) 2024-05-31 10:59:28 -06:00
d8e0be6ee6 feat: add support for multiple banners (#13081) 2024-05-08 15:40:43 -06:00
426e9f2b96 feat: support adjusting child proc oom scores (#12655) 2024-04-03 09:42:03 -05:00
4d5a7b2d56 chore(codersdk): move all tailscale imports out of codersdk (#12735)
Currently, importing `codersdk` just to interact with the API requires
importing tailscale, which causes builds to fail unless manually using
our fork.
2024-03-26 12:44:31 -05:00
653ddccd8e fix(agent): remove unused token debug handler (#12602) 2024-03-15 09:43:36 +00:00
63696d762f feat(codersdk): add debug handlers for logs, manifest, and token to agent (#12593)
* feat(codersdk): add debug handlers for logs, manifest, and token to agent

* add more logging

* use io.LimitReader instead of seeking
2024-03-14 15:36:12 +00:00
eba8cd7c07 chore: consolidate various randomPort() implementations (#12362)
Consolidates our existing randomPort() implementations to package testutil
2024-02-29 12:51:44 +00:00
4cc132cea0 feat: switch agent to use v2 API for sending logs (#12068)
Changes the agent to use the new v2 API for sending logs, via the logSender component.

We keep the PatchLogs function around, but deprecate it so that we can test the v1 endpoint.
2024-02-23 11:27:15 +04:00
b1c0b39d88 feat(agent): add script data dir for binaries and files (#12205)
The agent is extended with a `--script-data-dir` flag, defaulting to the
OS temp dir. This dir is used for storing `coder-script-data/bin` and
`coder-script/[script uuid]`. The former is a place for all scripts to
place executable binaries that will be available by other scripts, SSH
sessions, etc. The latter is a place for the script to store files.

Since we default to OS temp dir, files are ephemeral by default. In the
future, we may consider adding new env vars or changing the default
storage location. Workspace startup speed could potentially benefit from
scripts being able to skip steps that require downloading software. We
may also extend this with more env variables (e.g. persistent storage in
HOME).

Fixes #11131
2024-02-20 13:26:18 +02:00
c63f569174 refactor(agent/agentssh): move envs to agent and add agentssh config struct (#12204)
This commit refactors where custom environment variables are set in the
workspace and decouples agent specific configs from the `agentssh.Server`.
To reproduce all functionality, `agentssh.Config` is introduced.

The custom environment variables are now configured in `agent/agent.go`
and the agent retains control of the final state. This will allow for
easier extension in the future and keep other modules decoupled.
2024-02-19 16:30:00 +02:00
1cf4b62867 feat: change agent to use v2 API for reporting stats (#12024)
Modifies the agent to use the v2 API to report its statistics, using the `statsReporter` subcomponent.
2024-02-07 15:26:41 +04:00
2599850e54 feat: use agent v2 API to post startup (#11877)
Uses the v2 Agent API to post startup information.
2024-01-30 11:23:28 +04:00
da8bb1c198 feat: use agent v2 API to fetch manifest (#11832)
Agent uses the v2 API to obtain the manifest, instead of the HTTP API.
2024-01-30 10:11:28 +04:00
9cf4e7f15a fix: prevent agent_test.go from failing on error logs (#11909)
We're failing tests on error logs like this: https://github.com/coder/coder/actions/runs/7706053882/job/21000984583

Unfortunately, the error we hit, when the underlying connection is closed, is unexported, so we can't specifically ignore it.

Part of the issue is that agent.Close() doesn't wait for these goroutines to complete before returning, so the test harness proceeds to close the connection. This looks to our product code like the network connection failing.  It would be possible to fix this, but just doesn't seem worth it for the extra insurance of catching other error logs in these tests.
2024-01-30 10:04:01 +04:00
13e24f21e4 feat: use Agent v2 API for Service Banner (#11806)
Agent uses the v2 API for the service banner, rather than the v1 HTTP API.

One of several for #10534
2024-01-30 07:44:47 +04:00
059e533544 feat: agent uses Tailnet v2 API for DERPMap updates (#11698)
Switches the Agent to use Tailnet v2 API to get DERPMap updates.

Subsequent PRs will do the same for the CLI (`codersdk`) and `wsproxy`.
2024-01-23 14:42:07 +04:00
f01cab9894 feat: use tailnet v2 API for coordination (#11638)
This one is huge, and I'm sorry.

The problem is that once I change `tailnet.Conn` to start doing v2 behavior, I kind of have to change it everywhere, including in CoderSDK (CLI), the agent, wsproxy, and ServerTailnet.

There is still a bit more cleanup to do, and I need to add code so that when we lose connection to the Coordinator, we mark all peers as LOST, but that will be in a separate PR since this is big enough!
2024-01-22 11:07:50 +04:00
b173195e0d Revert "fix: detect JetBrains running on local ipv6 (#11653)" (#11664)
This reverts commit 2d61d5332a.
2024-01-17 15:38:39 +04:00
2d61d5332a fix: detect JetBrains running on local ipv6 (#11653) 2024-01-16 15:53:41 -09:00
dd05a6b13a chore: mockgen archived, moved to new location (#11415)
* chore: mockgen archived, moved to new location
2024-01-04 18:35:56 -06:00
df3c310379 feat(cli): add coder open vscode (#11191)
Fixes #7667
2024-01-02 20:46:18 +02:00
520c3a8ff7 fix: use TSMP for pings and checking reachability (#11306)
We're seeing some flaky tests related to agent connectivity - https://github.com/coder/coder/actions/runs/7286675441/job/19856270998

I'm pretty sure what happened in this one is that the client opened a connection while the wgengine was in the process of reconfiguring the wireguard device, so the fact that the peer became "active" as a result of traffic being sent was not noticed.

The test calls `AwaitReachable()` but this only tests the disco layer, so it doesn't wait for wireguard to come up.

I think we should be using TSMP for pinging and reachability, since this operates at the IP layer, and therefore requires that wireguard comes up before being successful.

This should also help with the problems we have seen where a TCP connection starts before wireguard is up and the initial round trip has to wait for the 5 second wireguard handshake retry.

fixes: #11294
2024-01-02 15:53:52 +04:00
db9104c02e fix: avoid panic on nil connection (#11305)
Related to https://github.com/coder/coder/actions/runs/7286675441/job/19855871305

Fixes a panic if the listener returns an error, which can obfuscate the underlying problem and cause unrelated tests to be marked failed.
2023-12-21 14:26:11 +04:00
b7bdb17460 feat: add metrics to workspace agent scripts (#11132)
* push startup script metrics to agent
2023-12-13 11:45:43 -06:00
dbbf8acc26 fix: track JetBrains connections (#10968)
* feat: implement jetbrains agentssh tracking

Based on tcp forwarding instead of ssh connections

* Add JetBrains tracking to bottom bar
2023-12-07 12:15:54 -09:00
70cede8f7a test(agent): improve TestAgent_Dial tests (#11013)
Refs #11008
2023-12-04 13:11:30 +02:00
6c67add2d9 fix: detect and retry reverse port forward on used port (#10844)
Fixes #10799

The flake happens when we try to remote forward, but the port we've chosen is not free.  In the flaked example, it's actually the SSH listener that occupies the port we try to remote forward, leading to confusing reads (c.f. the linked issue).

This fix simplies the tests considerably by using the Go ssh client, rather than shelling out to OpenSSH.  This avoids using a pseudoterminal, avoids the need for starting any local OS listeners to communicate the forwarding (go SSH just returns in-process listeners), and avoids an OS listener to wire OpenSSH up to the agentConn.

With the simplied logic, we can immediately tell if a remote forward on a random port fails, so we can do this in a loop until success or timeout.

I've also simplified and fixed up the other forwarding tests. Since we set up forwarding in-process with Go ssh, we can remove a lot of the `require.Eventually` logic.
2023-11-27 09:42:45 +04:00