Commit Graph

114 Commits

Author SHA1 Message Date
f2d229eed3 fix!: use devcontainer ID when rebuilding a devcontainer (#18604)
This PR replaces the use of the **container** ID with the
**devcontainer** ID. This is a breaking change. This allows rebuilding a
devcontainer when there is no valid container ID.
2025-06-26 11:41:57 +01:00
97474bb28b feat: support devcontainer agents in ui and unify backend (#18332)
This commit consolidates two container endpoints on the backend and improves the
frontend devcontainer support by showing names and displaying apps as
appropriate.

With this change, the frontend now has knowledge of the subagent and we can also
display things like port forwards.

The frontend was updated to show dev container labels on the border as well as
subagent connection status. The recreation flow was also adjusted a bit to show
placeholder app icons when relevant.

Support for apps was also added, although these are still WIP on the backend.
And the port forwarding utility was added in since the sub agents now provide
the necessary info.

Fixes coder/internal#666
2025-06-17 16:06:47 +03:00
a18eb9d08f feat(site): allow recreating devcontainers and showing dirty status (#18049)
This change allows showing the devcontainer dirty status in the UI as
well as a recreate button to update the devcontainer.

Closes #16424
2025-05-27 19:42:24 +03:00
0731304905 feat(agent/agentcontainers): recreate devcontainers concurrently (#18042)
This change introduces a refactor of the devcontainers recreation logic
which is now handled asynchronously rather than being request scoped.
The response was consequently changed from "No Content" to "Accepted" to
reflect this.

A new `Status` field was introduced to the devcontainer struct which
replaces `Running` (bool). This reflects that the devcontainer can now
be in various states (starting, running, stopped or errored).

The status field also protects against multiple concurrent recrations,
as long as they are initiated via the API.

Updates #16424
2025-05-26 18:30:52 +03:00
98e2ec4417 feat: show devcontainer dirty status and allow recreate (#17880)
Updates #16424
2025-05-19 12:56:10 +03:00
0b5f27f566 feat: add parent_id column to workspace_agents table (#17758)
Adds a new nullable column `parent_id` to `workspace_agents` table. This
lays the groundwork for having child agents.
2025-05-13 00:01:31 +01:00
268a50c193 feat(agent/agentcontainers): add file watcher and dirty status (#17573)
Fixes coder/internal#479
Fixes coder/internal#480
2025-04-29 11:53:58 +03:00
00b5f56734 feat(agent/agentcontainers): add devcontainers list endpoint (#17389)
This change allows listing both predefined and runtime-detected
devcontainers, as well as showing whether or not the devcontainer is
running and which container represents it.

Fixes coder/internal#478
2025-04-15 17:53:37 +03:00
25fb34cabe feat(agent): implement recreate for devcontainers (#17308)
This change implements an interface for running `@devcontainers/cli up`
and an API endpoint on the agent for triggering recreate for a running
devcontainer.

A couple of limitations:

1. Synchronous HTTP request, meaning the browser might choose to time it
out before it's done => no result/error (and devcontainer cli command
probably gets killed via ctx cancel).
2. Logs are only written to agent logs via slog, not as a "script" in
the UI.

Both 1 and 2 will be improved in future refactors.

Fixes coder/internal#481
Fixes coder/internal#482
2025-04-10 16:16:16 +03:00
5c8cac9fb7 feat: add name to workspace agent devcontainers (#17089)
In the presence of multiple devcontainers, it would be nice to
differentiate them by name. This change inherits the resource name from
terraform.

Refs #17076
2025-03-25 12:59:20 +00:00
69ba27e347 feat: allow specifying devcontainer on agent in terraform (#16997)
This change allows specifying devcontainers in terraform and plumbs it
through to the agent via agent manifest.

This will be used for autostarting devcontainers in a workspace.

Depends on coder/terraform-provider-coder#368
Updates #16423
2025-03-20 19:09:39 +02:00
3ac844ad3d chore(codersdk): rename WorkspaceAgent(Dev)container structs (#16996)
This is to free up the devcontainer name space for more targeted
structs.

Updates #16423
2025-03-19 10:16:14 +00:00
75b27e8f19 fix(agent/agentcontainers): improve testing of convertDockerInspect, return correct host port (#16887)
* Improves separation of concerns between `runDockerInspect` and
`convertDockerInspect`: `runDockerInspect` now just runs the command and
returns the output, while `convertDockerInspect` now does all of the
conversion and parsing logic.
* Improves testing of `convertDockerInspect` using real test fixtures.
* Fixes issue where the container port is returned instead of the host
port.
* Updates UI to link to correct host port. Container port is still
displayed in the button text, but the HostIP:HostPort is shown in a
popover.
* Adds stories for workspace agent UI
2025-03-18 14:37:45 +00:00
31b1ff7d3b feat(agent): add container list handler (#16346)
Fixes https://github.com/coder/coder/issues/16268

- Adds `/api/v2/workspaceagents/:id/containers` coderd endpoint that allows listing containers
visible to the agent. Optional filtering by labels is supported.
- Adds go tools to the `coder-dylib` CI step so we can generate mocks if needed
2025-02-10 11:29:30 +00:00
2c7f8ac65f chore: migrate to coder/websocket 1.8.12 (#15898)
Migrates us to `coder/websocket` v1.8.12 rather than `nhooyr/websocket` on an older version.

Works around https://github.com/coder/websocket/issues/504 by adding an explicit test for `xerrors.Is(err, io.EOF)` where we were previously getting `io.EOF` from the netConn.
2024-12-19 00:51:30 +04:00
148a5a3593 fix: fix goroutine leak in log streaming over websocket (#15709)
fixes #14881

Our handlers for streaming logs don't read from the websocket. We don't allow the client to send us any data, but the websocket library we use requires reading from the websocket to properly handle pings and closing. Not doing so can [can cause the websocket to hang on write](https://github.com/coder/websocket/issues/405), leaking go routines which were noticed in #14881.

This fixes the issue, and in process refactors our log streaming to a encoder/decoder package which provides generic types for sending JSON over websocket.

I'd also like for us to upgrade to the latest https://github.com/coder/websocket but we should also upgrade our tailscale fork before doing so to avoid including two copies of the websocket library.
2024-12-03 10:12:30 +04:00
ae522c558d feat: add agent timings (#14713)
* feat: begin impl of agent script timings

* feat: add job_id and display_name to script timings

* fix: increment migration number

* fix: rename migrations from 251 to 254

* test: get tests compiling

* fix: appease the linter

* fix: get tests passing again

* fix: drop column from correct table

* test: add fixture for agent script timings

* fix: typo

* fix: use job id used in provisioner job timings

* fix: increment migration number

* test: behaviour of script runner

* test: rewrite test

* test: does exit 1 script break things?

* test: rewrite test again

* fix: revert change

Not sure how this came to be, I do not recall manually changing
these files.

* fix: let code breathe

* fix: wrap errors

* fix: justify nolint

* fix: swap require.Equal argument order

* fix: add mutex operations

* feat: add 'ran_on_start' and 'blocked_login' fields

* fix: update testdata fixture

* fix: refer to agent_id instead of job_id in timings

* fix: JobID -> AgentID in dbauthz_test

* fix: add 'id' to scripts, make timing refer to script id

* fix: fix broken tests and convert bug

* fix: update testdata fixtures

* fix: update testdata fixtures again

* feat: capture stage and if script timed out

* fix: update migration number

* test: add test for script api

* fix: fake db query

* fix: use UTC time

* fix: ensure r.scriptComplete is not nil

* fix: move err check to right after call

* fix: uppercase sql

* fix: use dbtime.Now()

* fix: debug log on r.scriptCompleted being nil

* fix: ensure correct rbac permissions

* chore: remove DisplayName

* fix: get tests passing

* fix: remove space in sql up

* docs: document ExecuteOption

* fix: drop 'RETURNING' from sql

* chore: remove 'display_name' from timing table

* fix: testdata fixture

* fix: put r.scriptCompleted call in goroutine

* fix: track goroutine for test + use separate context for reporting

* fix: appease linter, handle trackCommandGoroutine error

* fix: resolve race condition

* feat: replace timed_out column with status column

* test: update testdata fixture

* fix: apply suggestions from review

* revert: linter changes
2024-09-24 10:51:49 +01:00
86f68b220e feat: add 'display_name' column to 'workspace_agent_scripts' (#14747)
* feat: add 'display_name' column to 'workspace_agent_scripts'

* fix: backfill from workspace_agent_log_sources

* fix: run 'make gen'
2024-09-20 14:26:13 +01:00
4d5a7b2d56 chore(codersdk): move all tailscale imports out of codersdk (#12735)
Currently, importing `codersdk` just to interact with the API requires
importing tailscale, which causes builds to fail unless manually using
our fork.
2024-03-26 12:44:31 -05:00
af3fdc68c3 chore: refactor agent routines that use the v2 API (#12223)
In anticipation of needing the `LogSender` to run on a context that doesn't get immediately canceled when you `Close()` the agent, I've undertaken a little refactor to manage the goroutines that get run against the Tailnet and Agent API connection.

This handles controlling two contexts, one that gets canceled right away at the start of graceful shutdown, and another that stays up to allow graceful shutdown to complete.
2024-02-23 11:04:23 +04:00
4c3d44658d fix(codersdk): correctly log coordination error (#12176) 2024-02-15 20:47:12 +00:00
1f5a6d59ba chore: consolidate websocketNetConn implementations (#12065)
Consolidates websocketNetConn from multiple packages in favor of a central one in codersdk
2024-02-09 11:39:08 +04:00
e5ba586e30 fix: fix graceful disconnect in DialWorkspaceAgent (#11993)
I noticed in testing that the CLI wasn't correctly sending the disconnect message when it shuts down, and thus agents are seeing this as a "lost" peer, rather than a "disconnected" one. 

What was happening is that we just used a single context for everything from the netconn to the RPCs, and when the context was canceled we failed to send the disconnect message due to canceled context.

So, this PR splits things into two contexts, with a graceful one set to last up to 1 second longer than the main one.
2024-02-05 14:01:37 +04:00
d3983e4dba feat: add logging to client tailnet yamux (#11908)
Adds logging to yamux when used for tailnet client connections, e.g. CLI and wsproxy.  This could be useful for debugging connection issues with tailnet v2 API.
2024-01-30 09:58:59 +04:00
1e8a9c09fe chore: remove legacy wsconncache (#11816)
Fixes #8218

Removes `wsconncache` and related "is legacy?" functions and API calls that were used by it.

The only leftover is that Agents still use the legacy IP, so that back level clients or workspace proxies can dial them correctly.

We should eventually remove this: #11819
2024-01-30 07:56:36 +04:00
f9fdd44510 feat: change codersdk to use tailnet v2 for DERPMap updates (#11736)
fixes #10533


refactors `codersdk` workspace agent dialer to use a single websocket connection to the tailnet v2 API for both coordination and DERPMap updates, rather than separate websockets (and the v1 API for DERPMaps).
2024-01-29 11:26:50 +04:00
3e0e7f8739 feat: check agent API version on connection (#11696)
fixes #10531

Adds a check for `version` on connection to the Agent API websocket endpoint.  This is primarily for future-proofing, so that up-level agents get a sensible error if they connect to a back-level Coderd.

It also refactors the location of the `CurrentVersion` variables, to be part of the `proto` packages, since the versions refer to the APIs defined therein.
2024-01-23 14:27:49 +04:00
f01cab9894 feat: use tailnet v2 API for coordination (#11638)
This one is huge, and I'm sorry.

The problem is that once I change `tailnet.Conn` to start doing v2 behavior, I kind of have to change it everywhere, including in CoderSDK (CLI), the agent, wsproxy, and ServerTailnet.

There is still a bit more cleanup to do, and I need to add code so that when we lose connection to the Coordinator, we mark all peers as LOST, but that will be in a separate PR since this is big enough!
2024-01-22 11:07:50 +04:00
695f57f7ff fix: use header flags in wsproxy server (#10985) 2023-12-05 14:13:42 +04:00
f441ad66e1 fix(codersdk): keep workspace agent connection open after dial context (#10863) 2023-11-27 14:29:57 +02:00
198b56c137 fix(coderd): fix memory leak in watchWorkspaceAgentMetadata (#10685)
Fixes #10550
2023-11-16 17:03:53 +02:00
a7c671ca07 feat: add workspace agent APIVersion (#10419)
Fixes #10339
2023-10-31 10:08:43 +04:00
45b53c285f feat: allow external services to be authable (#9996)
* feat: allow external services to be authable

* Refactor external auth config structure for defaults

* Add support for new config properties

* Change the name of external auth

* Move externalauth -> external-auth

* Run gen

* Fix tests

* Fix MW tests

* Fix git auth redirect

* Fix lint

* Fix name

* Allow any ID

* Fix invalid type test

* Fix e2e tests

* Fix comments

* Fix colors

* Allow accepting any type as string

* Run gen

* Fix href
2023-10-03 14:04:39 +00:00
8abca9bea7 chore: rename git_auth to external_auth in our schema (#9935)
* chore: rename `git_auth` to `external_auth` in our schema

We're changing Git auth to be external auth. It will support
any OAuth2 or OIDC provider.

To split up the larger change I want to contribute the schema
changes first, and I'll add the feature itself in another PR.

* Fix names

* Fix outdated view

* Rename some additional places

* Fix sort order

* Fix template versions auth route

* Fix types

* Fix dbauthz
2023-09-29 19:13:20 +00:00
1262eef2c0 feat: add support for coder_script (#9584)
* Add basic migrations

* Improve schema

* Refactor agent scripts into it's own package

* Support legacy start and stop script format

* Pipe the scripts!

* Finish the piping

* Fix context usage

* It works!

* Fix sql query

* Fix SQL query

* Rename `LogSourceID` -> `SourceID`

* Fix the FE

* fmt

* Rename migrations

* Fix log tests

* Fix lint err

* Fix gen

* Fix story type

* Rename source to script

* Fix schema jank

* Uncomment test

* Rename proto to TimeoutSeconds

* Fix comments

* Fix comments

* Fix legacy endpoint without specified log_source

* Fix non-blocking by default in agent

* Fix resources tests

* Fix dbfake

* Fix resources

* Fix linting I think

* Add fixtures

* fmt

* Fix startup script behavior

* Fix comments

* Fix context

* Fix cancel

* Fix SQL tests

* Fix e2e tests

* Interrupt on Windows

* Fix agent leaking script process

* Fix migrations

* Fix stories

* Fix duplicate logs appearing

* Gen

* Fix log location

* Fix tests

* Fix tests

* Fix log output

* Show display name in output

* Fix print

* Return timeout on start context

* Gen

* Fix fixture

* Fix the agent status

* Fix startup timeout msg

* Fix command using shared context

* Fix timeout draining

* Change signal type

* Add deterministic colors to startup script logs

---------

Co-authored-by: Muhammad Atif Ali <atif@coder.com>
2023-09-25 16:47:17 -05:00
530dd9d247 fix(coderd): subscribe to workspace when streaming agent logs to detect outdated build (#9729)
Fixes #9721
2023-09-19 20:02:27 +03:00
ee24260614 feat: allow configuring display apps from template (#9100) 2023-08-30 14:53:42 -05:00
64df076328 feat: add server flag to force DERP to use always websockets (#9238) 2023-08-24 17:22:31 +00:00
22e781eced chore: add /v2 to import module path (#9072)
* chore: add /v2 to import module path

go mod requires semantic versioning with versions greater than 1.x

This was a mechanical update by running:
```
go install github.com/marwan-at-work/mod/cmd/mod@latest
mod upgrade
```

Migrate generated files to import /v2

* Fix gen
2023-08-18 18:55:43 +00:00
07fd73c4a0 chore: allow multiple agent subsystems, add exectrace (#8933) 2023-08-08 22:10:28 -07:00
bd944e0d21 chore: rename startup logs to agent logs (#8649)
* chore: rename startup logs to agent logs

This also adds a `source` property to every agent log. It
should allow us to group logs and display them nicer in
the UI as they stream in.

* Fix migration order

* Fix naming

* Rename the frontend

* Fix tests

* Fix down migration

* Match enums for workspace agent logs

* Fix inserting log source

* Fix migration order

* Fix logs tests

* Fix psql insert
2023-07-28 15:57:23 +00:00
74c4553a3f fix(codersdk): always dial agents with WorkspaceAgentIP (#8760) 2023-07-27 03:44:44 +00:00
2f0a9996e7 chore: add derpserver to wsproxy, add proxies to derpmap (#7311) 2023-07-27 02:21:04 +10:00
c47b78c44b chore: replace wsconncache with a single tailnet (#8176) 2023-07-12 17:37:31 -05:00
b73f9d8e86 feat: add computed workspace and agent health fields to the api (#8280) 2023-07-10 12:40:11 +03:00
d3c39b60c9 feat: add agent log streaming and follow provisioner format (#8170) 2023-06-28 10:54:13 +02:00
a28d422c35 feat: add flag to disable all direct connections (#7936) 2023-06-21 22:02:05 +00:00
24b95e16c4 feat: add --disable-direct flag to CLI (#8131) 2023-06-21 20:22:43 +00:00
bc739bdfce feat(cli): add hidden netcheck command (#8136) 2023-06-21 14:33:19 -05:00
8dac0356ed refactor: replace startup script logs EOF with starting/ready time (#8082)
This commit reverts some of the changes in #8029 and implements an
alternative method of keeping track of when the startup script has ended
and there will be no more logs.

This is achieved by adding new agent fields for tracking when the agent
enters the "starting" and "ready"/"start_error" lifecycle states. The
timestamps simplify logic since we don't need understand if the current
state is before or after the state we're interested in. They can also be
used to show data like how long the startup script took to execute. This
also allowed us to remove the EOF field from the logs as the
implementation was problematic when we returned the EOF log entry in the
response since requesting _after_ that ID would give no logs and the API
would thus lose track of EOF.
2023-06-20 14:41:55 +03:00