Commit Graph

240 Commits

Author SHA1 Message Date
8ea956fc11 feat: add app status tracking to the backend (#17163)
This does ~95% of the backend work required to integrate the AI work.

Most left to integrate from the tasks branch is just frontend, which
will be a lot smaller I believe.

The real difference between this branch and that one is the abstraction
-- this now attaches statuses to apps, and returns the latest status
reported as part of a workspace.

This change enables us to have a similar UX to in the tasks branch, but
for agents other than Claude Code as well. Any app can report status
now.
2025-03-31 10:55:44 -04:00
9bc727e977 chore: add support for one-way websockets to backend (#16853)
Closes https://github.com/coder/coder/issues/16775

## Changes made
- Added `OneWayWebSocket` function that establishes WebSocket
connections that don't allow client-to-server communication
- Added tests for the new function
- Updated API endpoints to make new WS-based endpoints, and mark
previous SSE-based endpoints as deprecated
- Updated existing SSE handlers to use the same core logic as the new WS
handlers

## Notes
- Frontend changes handled via #16855
2025-03-28 17:13:20 -04:00
17ddee05e5 chore: update golang to 1.24.1 (#17035)
- Update go.mod to use Go 1.24.1
- Update GitHub Actions setup-go action to use Go 1.24.1
- Fix linting issues with golangci-lint by:
  - Updating to golangci-lint v1.57.1 (more compatible with Go 1.24.1)

🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <claude@anthropic.com>
2025-03-26 01:56:39 -05:00
117e4c2fe7 feat: adds device_id, device_os, and coder_desktop_version to telemetry (#17086)
Records the Device ID, Device OS and Coder Desktop version to telemetry.

These values are provided by the Coder Desktop client in the StartRequest method of the VPN protocol. We render them as an HTTP header to transmit to Coderd, where they are decoded and added to telemetry.
2025-03-25 15:26:05 +04:00
e0ecc28638 feat: add telemetry to user-scoped tailnet API call (#17065)
Adds support for sending telemetry on calls to the User-scoped tailnet RPC endpoint. This is currently used only by Coder Desktop.

Later PRs will fill in the version, OS information, and device ID via HTTP headers.
2025-03-24 16:02:33 +04:00
3ac844ad3d chore(codersdk): rename WorkspaceAgent(Dev)container structs (#16996)
This is to free up the devcontainer name space for more targeted
structs.

Updates #16423
2025-03-19 10:16:14 +00:00
04c33968cf refactor: replace golang.org/x/exp/slices with slices (#16772)
The experimental functions in `golang.org/x/exp/slices` are now
available in the standard library since Go 1.21.

Reference: https://go.dev/doc/go1.21#slices

Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
2025-03-04 00:46:49 +11:00
d50e846747 fix: block vpn tailnet endpoint when --browser-only is set (#16647)
The work on CoderVPN required a new user-scoped `/tailnet` endpoint for
coordinating with multiple workspace agents, and receiving workspace
updates. Much like the `/coordinate` endpoint, this needs to respect the
`CODER_BROWSER_ONLY`/`--browser-only` deployment config value.
2025-02-21 12:21:20 +11:00
31b1ff7d3b feat(agent): add container list handler (#16346)
Fixes https://github.com/coder/coder/issues/16268

- Adds `/api/v2/workspaceagents/:id/containers` coderd endpoint that allows listing containers
visible to the agent. Optional filtering by labels is supported.
- Adds go tools to the `coder-dylib` CI step so we can generate mocks if needed
2025-02-10 11:29:30 +00:00
2c7f8ac65f chore: migrate to coder/websocket 1.8.12 (#15898)
Migrates us to `coder/websocket` v1.8.12 rather than `nhooyr/websocket` on an older version.

Works around https://github.com/coder/websocket/issues/504 by adding an explicit test for `xerrors.Is(err, io.EOF)` where we were previously getting `io.EOF` from the netConn.
2024-12-19 00:51:30 +04:00
148a5a3593 fix: fix goroutine leak in log streaming over websocket (#15709)
fixes #14881

Our handlers for streaming logs don't read from the websocket. We don't allow the client to send us any data, but the websocket library we use requires reading from the websocket to properly handle pings and closing. Not doing so can [can cause the websocket to hang on write](https://github.com/coder/websocket/issues/405), leaking go routines which were noticed in #14881.

This fixes the issue, and in process refactors our log streaming to a encoder/decoder package which provides generic types for sending JSON over websocket.

I'd also like for us to upgrade to the latest https://github.com/coder/websocket but we should also upgrade our tailscale fork before doing so to avoid including two copies of the websocket library.
2024-12-03 10:12:30 +04:00
b1298a3c1e feat: add WorkspaceUpdates tailnet RPC (#14847)
Closes #14716
Closes #14717

Adds a new user-scoped tailnet API endpoint (`api/v2/tailnet`) with a new RPC stream for receiving updates on workspaces owned by a specific user, as defined in #14716. 

When a stream is started, the `WorkspaceUpdatesProvider` will begin listening on the user-scoped pubsub events implemented in #14964. When a relevant event type is seen (such as a workspace state transition), the provider will query the DB for all the workspaces (and agents) owned by the user. This gets compared against the result of the previous query to produce a set of workspace updates. 

Workspace updates can be requested for any user ID, however only workspaces the authorised user is permitted to `ActionRead` will have their updates streamed.
Opening a tunnel to an agent requires that the user can perform `ActionSSH` against the workspace containing it.
2024-11-01 14:53:53 +11:00
31506e694b chore: send workspace pubsub events by owner id (#14964)
We currently send empty payloads to pubsub channels of the form `workspace:<workspace_id>` to notify listeners of updates to workspaces (such as for refreshing the workspace dashboard).

To support https://github.com/coder/coder/issues/14716, we'll instead send `WorkspaceEvent` payloads to pubsub channels of the form `workspace_owner:<owner_id>`. This enables a listener to receive events for all workspaces owned by a user.
This PR replaces the usage of the old channels without modifying any existing behaviors.

```
type WorkspaceEvent struct {
	Kind        WorkspaceEventKind `json:"kind"`
	WorkspaceID uuid.UUID          `json:"workspace_id" format:"uuid"`
	// AgentID is only set for WorkspaceEventKindAgent* events
	// (excluding AgentTimeout)
	AgentID *uuid.UUID `json:"agent_id,omitempty" format:"uuid"`
}
```

We've defined `WorkspaceEventKind`s based on how the old channel was used, but it's not yet necessary to inspect the types of any of the events, as the existing listeners are designed to fire off any of them.

```
WorkspaceEventKindStateChange     WorkspaceEventKind = "state_change"
WorkspaceEventKindStatsUpdate     WorkspaceEventKind = "stats_update"
WorkspaceEventKindMetadataUpdate  WorkspaceEventKind = "mtd_update"
WorkspaceEventKindAppHealthUpdate WorkspaceEventKind = "app_health"

WorkspaceEventKindAgentLifecycleUpdate  WorkspaceEventKind = "agt_lifecycle_update"
WorkspaceEventKindAgentLogsUpdate       WorkspaceEventKind = "agt_logs_update"
WorkspaceEventKindAgentConnectionUpdate WorkspaceEventKind = "agt_connection_update"
WorkspaceEventKindAgentLogsOverflow     WorkspaceEventKind = "agt_logs_overflow"
WorkspaceEventKindAgentTimeout          WorkspaceEventKind = "agt_timeout"
```
2024-11-01 14:17:05 +11:00
cd890aa3a0 feat: enable key rotation (#15066)
This PR contains the remaining logic necessary to hook up key rotation
to the product.
2024-10-25 17:14:35 +01:00
343f8ec9ab chore: join owner, template, and org in new workspace view (#15116)
Joins in fields like `username`, `avatar_url`, `organization_name`,
`template_name` to `workspaces` via a **view**. 
The view must be maintained moving forward, but this prevents needing to
add RBAC permissions to fetch related workspace fields.
2024-10-22 09:20:54 -05:00
ae522c558d feat: add agent timings (#14713)
* feat: begin impl of agent script timings

* feat: add job_id and display_name to script timings

* fix: increment migration number

* fix: rename migrations from 251 to 254

* test: get tests compiling

* fix: appease the linter

* fix: get tests passing again

* fix: drop column from correct table

* test: add fixture for agent script timings

* fix: typo

* fix: use job id used in provisioner job timings

* fix: increment migration number

* test: behaviour of script runner

* test: rewrite test

* test: does exit 1 script break things?

* test: rewrite test again

* fix: revert change

Not sure how this came to be, I do not recall manually changing
these files.

* fix: let code breathe

* fix: wrap errors

* fix: justify nolint

* fix: swap require.Equal argument order

* fix: add mutex operations

* feat: add 'ran_on_start' and 'blocked_login' fields

* fix: update testdata fixture

* fix: refer to agent_id instead of job_id in timings

* fix: JobID -> AgentID in dbauthz_test

* fix: add 'id' to scripts, make timing refer to script id

* fix: fix broken tests and convert bug

* fix: update testdata fixtures

* fix: update testdata fixtures again

* feat: capture stage and if script timed out

* fix: update migration number

* test: add test for script api

* fix: fake db query

* fix: use UTC time

* fix: ensure r.scriptComplete is not nil

* fix: move err check to right after call

* fix: uppercase sql

* fix: use dbtime.Now()

* fix: debug log on r.scriptCompleted being nil

* fix: ensure correct rbac permissions

* chore: remove DisplayName

* fix: get tests passing

* fix: remove space in sql up

* docs: document ExecuteOption

* fix: drop 'RETURNING' from sql

* chore: remove 'display_name' from timing table

* fix: testdata fixture

* fix: put r.scriptCompleted call in goroutine

* fix: track goroutine for test + use separate context for reporting

* fix: appease linter, handle trackCommandGoroutine error

* fix: resolve race condition

* feat: replace timed_out column with status column

* test: update testdata fixture

* fix: apply suggestions from review

* revert: linter changes
2024-09-24 10:51:49 +01:00
86f68b220e feat: add 'display_name' column to 'workspace_agent_scripts' (#14747)
* feat: add 'display_name' column to 'workspace_agent_scripts'

* fix: backfill from workspace_agent_log_sources

* fix: run 'make gen'
2024-09-20 14:26:13 +01:00
5bd19f8ba3 fix: fix flake in TestWorkspaceAgentClientCoordinate_ResumeToken (#14642)
fixes #14365

I bet what's going on is that in `connectToCoordinatorAndFetchResumeToken()` we call `Coordinate()`, send a message on the `Coordinate` client and then close it in rapid succession. We don't wait around for a response from the coordinator, so dRPC is likely aborting the call `Coordinate()` in the backend because the stream is closed before it even gets a chance.

Instead of using the Coordinator to record the peer ID assigned on the API call, we can wrap the resume token provider, since we call that API _and_ wait for a response. This also affords the opportunity to directly assert we get called with the right token.
2024-09-11 16:32:47 +04:00
cf8be4eac5 feat: add resume support to coordinator connections (#14234) 2024-08-20 17:16:49 +10:00
dd243686e4 chore!: remove deprecated agent v1 routes (#13486) 2024-06-11 12:22:59 +10:00
9d00a26a90 fix: add missing route for codersdk.PostLogSource (#13421) 2024-06-03 12:29:50 -05:00
24ba81930b chore: return failed refresh errors on external auth as string (was boolean) (#13402)
* chore: return failed refresh errors on external auth

Failed refreshes should return errors. These errors are captured
as validate errors.
2024-06-03 09:33:49 -05:00
5789ea5397 chore: move stat reporting into workspacestats package (#13386) 2024-05-29 11:49:08 -04:00
1f5788feff chore: remove rbac psuedo resources, add custom verbs (#13276)
Removes our pseudo rbac resources like `WorkspaceApplicationConnect` in favor of additional verbs like `ssh`. This is to make more intuitive permissions for building custom roles.

The source of truth is now `policy.go`
2024-05-15 11:09:42 -05:00
cb6b5e8fbd chore: push rbac actions to policy package (#13274)
Just moved `rbac.Action` -> `policy.Action`. This is for the stacked PR to not have circular dependencies when doing autogen. Without this, the autogen can produce broken golang code, which prevents the autogen from compiling.

So just avoiding circular dependencies. Doing this in it's own PR to reduce LoC diffs in the primary PR, since this has 0 functional changes.
2024-05-15 09:46:35 -05:00
721ab2a1b4 chore: add workspace activity linter (#13273) 2024-05-14 12:31:31 -04:00
845407fe7a chore: cover deadline crossing autostart border on start (#13115)
When starting a workspace, if the deadline crosses an autostart boundary, the deadline is set to autostart + TTL. 
This copies the behavior in `ActivityBumpWorkspace`, but does not require activity.
2024-05-01 10:43:04 -05:00
189b8626d0 chore: deprecate agent report-stats endpoint (#12880)
* chore: deprecate agent report-stats endpoint

Agent API is now used instead.

* Update coderd/workspaceagents.go

Co-authored-by: Spike Curtis <spike@coder.com>

---------

Co-authored-by: Spike Curtis <spike@coder.com>
2024-04-09 09:38:26 -05:00
4d5a7b2d56 chore(codersdk): move all tailscale imports out of codersdk (#12735)
Currently, importing `codersdk` just to interact with the API requires
importing tailscale, which causes builds to fail unless manually using
our fork.
2024-03-26 12:44:31 -05:00
f34592f45d fix(coderd): skip logging error for cancelled query in agent report stats (#12730) 2024-03-25 12:20:16 +02:00
0723dd3abf fix: ensure agent token is from latest build in middleware (#12443) 2024-03-14 12:27:32 -04:00
1f5a6d59ba chore: consolidate websocketNetConn implementations (#12065)
Consolidates websocketNetConn from multiple packages in favor of a central one in codersdk
2024-02-09 11:39:08 +04:00
c0e169ebf9 feat: support custom order of agent metadata (#12066) 2024-02-08 17:29:34 +01:00
27f3b7a814 fix: add timeout to listening ports request (#11935)
This can potentially hang for 15m if the agent is unreachable.
2024-01-30 13:53:52 -06:00
0eff646c31 chore: move proto to sdk conversion to agentsdk (#11831)
`agentsdk` depends on `agent/proto` because it needs to get the version to dial.

Therefore, the conversion routines need to live in `agentsdk` so that we can convert to and from the Manifest.

I briefly considered refactoring the agent to only reference `proto.Manifest`, but decided against it because we might have multiple protocol versions in the future, its useful to have a protocol-independent data structure.
2024-01-30 09:04:56 +04:00
1e8a9c09fe chore: remove legacy wsconncache (#11816)
Fixes #8218

Removes `wsconncache` and related "is legacy?" functions and API calls that were used by it.

The only leftover is that Agents still use the legacy IP, so that back level clients or workspace proxies can dial them correctly.

We should eventually remove this: #11819
2024-01-30 07:56:36 +04:00
d66e6e78ee fix: always attempt external auth refresh when fetching (#11762) (#11830)
* fix: always attempt external auth refresh when fetching
* refactor validate to check expiry when considering "valid"
2024-01-29 08:55:15 -06:00
29707099d7 chore: add agentapi tests (#11269) 2024-01-26 07:04:19 +00:00
79568bf628 Revert "fix: always attempt external auth refresh when fetching (#11762)"
This reverts commit 0befc0826a.
2024-01-25 14:22:47 -06:00
0befc0826a fix: always attempt external auth refresh when fetching (#11762)
* fix: always attempt external auth refresh when fetching
* refactor validate to check expiry when considering "valid"
2024-01-25 10:54:56 -06:00
5cbb76b47a fix: stop spamming DERP map updates for equivalent maps (#11792)
Fixes 2 related issues:

1. wsconncache had incorrect logic to test whether to send DERPMap updates, sending if the maps were equivalent, instead of if they were _not equivalent_.
2. configmaps used a bugged check to test equality between DERPMaps, since it contains a map and the map entries are serialized in random order. Instead, we avoid comparing the protobufs and instead depend on the existing function that compares `tailcfg.DERPMap`. This also has the effect of reducing the number of times we convert to and from protobuf.
2024-01-24 16:27:15 +04:00
3e0e7f8739 feat: check agent API version on connection (#11696)
fixes #10531

Adds a check for `version` on connection to the Agent API websocket endpoint.  This is primarily for future-proofing, so that up-level agents get a sensible error if they connect to a back-level Coderd.

It also refactors the location of the `CurrentVersion` variables, to be part of the `proto` packages, since the versions refer to the APIs defined therein.
2024-01-23 14:27:49 +04:00
f01cab9894 feat: use tailnet v2 API for coordination (#11638)
This one is huge, and I'm sorry.

The problem is that once I change `tailnet.Conn` to start doing v2 behavior, I kind of have to change it everywhere, including in CoderSDK (CLI), the agent, wsproxy, and ServerTailnet.

There is still a bit more cleanup to do, and I need to add code so that when we lose connection to the Coordinator, we mark all peers as LOST, but that will be in a separate PR since this is big enough!
2024-01-22 11:07:50 +04:00
03ee63931c chore: remove duplicate validate calls on same oauth token (#11598)
* chore: remove duplicate validate calls on same oauth token
2024-01-12 14:27:22 -06:00
4d2fe2685a chore(coderd): extract api version validation to util package (#11407) 2024-01-05 10:22:07 +00:00
df3c310379 feat(cli): add coder open vscode (#11191)
Fixes #7667
2024-01-02 20:46:18 +02:00
c9b7d61769 chore: refactor agent connection updates (#11301)
Refactors the code that handles monitoring an agent websocket with pings and updating the connection times in the DB.

Consolidates v1 and v2 agent APIs under the same code for this.

One substantive change (not _just_ a refactor) is that I've made it so that we actually disconnect if the agent fails to respond to our pings, rather than the old behavior where we would update the database, but not actually tear down the websocket.
2024-01-02 16:04:37 +04:00
fe867d02e0 fix: correct perms for forbidden error in TemplateScheduleStore.Load (#11286)
* chore: TemplateScheduleStore.Load() throwing forbidden error
* fix: workspace agent scope to include template
2023-12-20 11:38:49 -06:00
e46431078c feat: add AgentAPI using DRPC (#10811)
Co-authored-by: Spike Curtis <spike@coder.com>
2023-12-18 22:53:28 +10:00
211e59bf65 feat: add tailnet v2 API support to coordinate endpoint (#11228)
closes #10532

Adds v2 support to the /coordinate endpoint via a query parameter.

v1 already has test cases, and we haven't implemented v2 at the client yet, so the only new test case is an unsupported version.
2023-12-15 14:10:24 +04:00