Commit Graph

234 Commits

Author SHA1 Message Date
04c33968cf refactor: replace golang.org/x/exp/slices with slices (#16772)
The experimental functions in `golang.org/x/exp/slices` are now
available in the standard library since Go 1.21.

Reference: https://go.dev/doc/go1.21#slices

Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
2025-03-04 00:46:49 +11:00
d50e846747 fix: block vpn tailnet endpoint when --browser-only is set (#16647)
The work on CoderVPN required a new user-scoped `/tailnet` endpoint for
coordinating with multiple workspace agents, and receiving workspace
updates. Much like the `/coordinate` endpoint, this needs to respect the
`CODER_BROWSER_ONLY`/`--browser-only` deployment config value.
2025-02-21 12:21:20 +11:00
31b1ff7d3b feat(agent): add container list handler (#16346)
Fixes https://github.com/coder/coder/issues/16268

- Adds `/api/v2/workspaceagents/:id/containers` coderd endpoint that allows listing containers
visible to the agent. Optional filtering by labels is supported.
- Adds go tools to the `coder-dylib` CI step so we can generate mocks if needed
2025-02-10 11:29:30 +00:00
2c7f8ac65f chore: migrate to coder/websocket 1.8.12 (#15898)
Migrates us to `coder/websocket` v1.8.12 rather than `nhooyr/websocket` on an older version.

Works around https://github.com/coder/websocket/issues/504 by adding an explicit test for `xerrors.Is(err, io.EOF)` where we were previously getting `io.EOF` from the netConn.
2024-12-19 00:51:30 +04:00
148a5a3593 fix: fix goroutine leak in log streaming over websocket (#15709)
fixes #14881

Our handlers for streaming logs don't read from the websocket. We don't allow the client to send us any data, but the websocket library we use requires reading from the websocket to properly handle pings and closing. Not doing so can [can cause the websocket to hang on write](https://github.com/coder/websocket/issues/405), leaking go routines which were noticed in #14881.

This fixes the issue, and in process refactors our log streaming to a encoder/decoder package which provides generic types for sending JSON over websocket.

I'd also like for us to upgrade to the latest https://github.com/coder/websocket but we should also upgrade our tailscale fork before doing so to avoid including two copies of the websocket library.
2024-12-03 10:12:30 +04:00
b1298a3c1e feat: add WorkspaceUpdates tailnet RPC (#14847)
Closes #14716
Closes #14717

Adds a new user-scoped tailnet API endpoint (`api/v2/tailnet`) with a new RPC stream for receiving updates on workspaces owned by a specific user, as defined in #14716. 

When a stream is started, the `WorkspaceUpdatesProvider` will begin listening on the user-scoped pubsub events implemented in #14964. When a relevant event type is seen (such as a workspace state transition), the provider will query the DB for all the workspaces (and agents) owned by the user. This gets compared against the result of the previous query to produce a set of workspace updates. 

Workspace updates can be requested for any user ID, however only workspaces the authorised user is permitted to `ActionRead` will have their updates streamed.
Opening a tunnel to an agent requires that the user can perform `ActionSSH` against the workspace containing it.
2024-11-01 14:53:53 +11:00
31506e694b chore: send workspace pubsub events by owner id (#14964)
We currently send empty payloads to pubsub channels of the form `workspace:<workspace_id>` to notify listeners of updates to workspaces (such as for refreshing the workspace dashboard).

To support https://github.com/coder/coder/issues/14716, we'll instead send `WorkspaceEvent` payloads to pubsub channels of the form `workspace_owner:<owner_id>`. This enables a listener to receive events for all workspaces owned by a user.
This PR replaces the usage of the old channels without modifying any existing behaviors.

```
type WorkspaceEvent struct {
	Kind        WorkspaceEventKind `json:"kind"`
	WorkspaceID uuid.UUID          `json:"workspace_id" format:"uuid"`
	// AgentID is only set for WorkspaceEventKindAgent* events
	// (excluding AgentTimeout)
	AgentID *uuid.UUID `json:"agent_id,omitempty" format:"uuid"`
}
```

We've defined `WorkspaceEventKind`s based on how the old channel was used, but it's not yet necessary to inspect the types of any of the events, as the existing listeners are designed to fire off any of them.

```
WorkspaceEventKindStateChange     WorkspaceEventKind = "state_change"
WorkspaceEventKindStatsUpdate     WorkspaceEventKind = "stats_update"
WorkspaceEventKindMetadataUpdate  WorkspaceEventKind = "mtd_update"
WorkspaceEventKindAppHealthUpdate WorkspaceEventKind = "app_health"

WorkspaceEventKindAgentLifecycleUpdate  WorkspaceEventKind = "agt_lifecycle_update"
WorkspaceEventKindAgentLogsUpdate       WorkspaceEventKind = "agt_logs_update"
WorkspaceEventKindAgentConnectionUpdate WorkspaceEventKind = "agt_connection_update"
WorkspaceEventKindAgentLogsOverflow     WorkspaceEventKind = "agt_logs_overflow"
WorkspaceEventKindAgentTimeout          WorkspaceEventKind = "agt_timeout"
```
2024-11-01 14:17:05 +11:00
cd890aa3a0 feat: enable key rotation (#15066)
This PR contains the remaining logic necessary to hook up key rotation
to the product.
2024-10-25 17:14:35 +01:00
343f8ec9ab chore: join owner, template, and org in new workspace view (#15116)
Joins in fields like `username`, `avatar_url`, `organization_name`,
`template_name` to `workspaces` via a **view**. 
The view must be maintained moving forward, but this prevents needing to
add RBAC permissions to fetch related workspace fields.
2024-10-22 09:20:54 -05:00
ae522c558d feat: add agent timings (#14713)
* feat: begin impl of agent script timings

* feat: add job_id and display_name to script timings

* fix: increment migration number

* fix: rename migrations from 251 to 254

* test: get tests compiling

* fix: appease the linter

* fix: get tests passing again

* fix: drop column from correct table

* test: add fixture for agent script timings

* fix: typo

* fix: use job id used in provisioner job timings

* fix: increment migration number

* test: behaviour of script runner

* test: rewrite test

* test: does exit 1 script break things?

* test: rewrite test again

* fix: revert change

Not sure how this came to be, I do not recall manually changing
these files.

* fix: let code breathe

* fix: wrap errors

* fix: justify nolint

* fix: swap require.Equal argument order

* fix: add mutex operations

* feat: add 'ran_on_start' and 'blocked_login' fields

* fix: update testdata fixture

* fix: refer to agent_id instead of job_id in timings

* fix: JobID -> AgentID in dbauthz_test

* fix: add 'id' to scripts, make timing refer to script id

* fix: fix broken tests and convert bug

* fix: update testdata fixtures

* fix: update testdata fixtures again

* feat: capture stage and if script timed out

* fix: update migration number

* test: add test for script api

* fix: fake db query

* fix: use UTC time

* fix: ensure r.scriptComplete is not nil

* fix: move err check to right after call

* fix: uppercase sql

* fix: use dbtime.Now()

* fix: debug log on r.scriptCompleted being nil

* fix: ensure correct rbac permissions

* chore: remove DisplayName

* fix: get tests passing

* fix: remove space in sql up

* docs: document ExecuteOption

* fix: drop 'RETURNING' from sql

* chore: remove 'display_name' from timing table

* fix: testdata fixture

* fix: put r.scriptCompleted call in goroutine

* fix: track goroutine for test + use separate context for reporting

* fix: appease linter, handle trackCommandGoroutine error

* fix: resolve race condition

* feat: replace timed_out column with status column

* test: update testdata fixture

* fix: apply suggestions from review

* revert: linter changes
2024-09-24 10:51:49 +01:00
86f68b220e feat: add 'display_name' column to 'workspace_agent_scripts' (#14747)
* feat: add 'display_name' column to 'workspace_agent_scripts'

* fix: backfill from workspace_agent_log_sources

* fix: run 'make gen'
2024-09-20 14:26:13 +01:00
5bd19f8ba3 fix: fix flake in TestWorkspaceAgentClientCoordinate_ResumeToken (#14642)
fixes #14365

I bet what's going on is that in `connectToCoordinatorAndFetchResumeToken()` we call `Coordinate()`, send a message on the `Coordinate` client and then close it in rapid succession. We don't wait around for a response from the coordinator, so dRPC is likely aborting the call `Coordinate()` in the backend because the stream is closed before it even gets a chance.

Instead of using the Coordinator to record the peer ID assigned on the API call, we can wrap the resume token provider, since we call that API _and_ wait for a response. This also affords the opportunity to directly assert we get called with the right token.
2024-09-11 16:32:47 +04:00
cf8be4eac5 feat: add resume support to coordinator connections (#14234) 2024-08-20 17:16:49 +10:00
dd243686e4 chore!: remove deprecated agent v1 routes (#13486) 2024-06-11 12:22:59 +10:00
9d00a26a90 fix: add missing route for codersdk.PostLogSource (#13421) 2024-06-03 12:29:50 -05:00
24ba81930b chore: return failed refresh errors on external auth as string (was boolean) (#13402)
* chore: return failed refresh errors on external auth

Failed refreshes should return errors. These errors are captured
as validate errors.
2024-06-03 09:33:49 -05:00
5789ea5397 chore: move stat reporting into workspacestats package (#13386) 2024-05-29 11:49:08 -04:00
1f5788feff chore: remove rbac psuedo resources, add custom verbs (#13276)
Removes our pseudo rbac resources like `WorkspaceApplicationConnect` in favor of additional verbs like `ssh`. This is to make more intuitive permissions for building custom roles.

The source of truth is now `policy.go`
2024-05-15 11:09:42 -05:00
cb6b5e8fbd chore: push rbac actions to policy package (#13274)
Just moved `rbac.Action` -> `policy.Action`. This is for the stacked PR to not have circular dependencies when doing autogen. Without this, the autogen can produce broken golang code, which prevents the autogen from compiling.

So just avoiding circular dependencies. Doing this in it's own PR to reduce LoC diffs in the primary PR, since this has 0 functional changes.
2024-05-15 09:46:35 -05:00
721ab2a1b4 chore: add workspace activity linter (#13273) 2024-05-14 12:31:31 -04:00
845407fe7a chore: cover deadline crossing autostart border on start (#13115)
When starting a workspace, if the deadline crosses an autostart boundary, the deadline is set to autostart + TTL. 
This copies the behavior in `ActivityBumpWorkspace`, but does not require activity.
2024-05-01 10:43:04 -05:00
189b8626d0 chore: deprecate agent report-stats endpoint (#12880)
* chore: deprecate agent report-stats endpoint

Agent API is now used instead.

* Update coderd/workspaceagents.go

Co-authored-by: Spike Curtis <spike@coder.com>

---------

Co-authored-by: Spike Curtis <spike@coder.com>
2024-04-09 09:38:26 -05:00
4d5a7b2d56 chore(codersdk): move all tailscale imports out of codersdk (#12735)
Currently, importing `codersdk` just to interact with the API requires
importing tailscale, which causes builds to fail unless manually using
our fork.
2024-03-26 12:44:31 -05:00
f34592f45d fix(coderd): skip logging error for cancelled query in agent report stats (#12730) 2024-03-25 12:20:16 +02:00
0723dd3abf fix: ensure agent token is from latest build in middleware (#12443) 2024-03-14 12:27:32 -04:00
1f5a6d59ba chore: consolidate websocketNetConn implementations (#12065)
Consolidates websocketNetConn from multiple packages in favor of a central one in codersdk
2024-02-09 11:39:08 +04:00
c0e169ebf9 feat: support custom order of agent metadata (#12066) 2024-02-08 17:29:34 +01:00
27f3b7a814 fix: add timeout to listening ports request (#11935)
This can potentially hang for 15m if the agent is unreachable.
2024-01-30 13:53:52 -06:00
0eff646c31 chore: move proto to sdk conversion to agentsdk (#11831)
`agentsdk` depends on `agent/proto` because it needs to get the version to dial.

Therefore, the conversion routines need to live in `agentsdk` so that we can convert to and from the Manifest.

I briefly considered refactoring the agent to only reference `proto.Manifest`, but decided against it because we might have multiple protocol versions in the future, its useful to have a protocol-independent data structure.
2024-01-30 09:04:56 +04:00
1e8a9c09fe chore: remove legacy wsconncache (#11816)
Fixes #8218

Removes `wsconncache` and related "is legacy?" functions and API calls that were used by it.

The only leftover is that Agents still use the legacy IP, so that back level clients or workspace proxies can dial them correctly.

We should eventually remove this: #11819
2024-01-30 07:56:36 +04:00
d66e6e78ee fix: always attempt external auth refresh when fetching (#11762) (#11830)
* fix: always attempt external auth refresh when fetching
* refactor validate to check expiry when considering "valid"
2024-01-29 08:55:15 -06:00
29707099d7 chore: add agentapi tests (#11269) 2024-01-26 07:04:19 +00:00
79568bf628 Revert "fix: always attempt external auth refresh when fetching (#11762)"
This reverts commit 0befc0826a.
2024-01-25 14:22:47 -06:00
0befc0826a fix: always attempt external auth refresh when fetching (#11762)
* fix: always attempt external auth refresh when fetching
* refactor validate to check expiry when considering "valid"
2024-01-25 10:54:56 -06:00
5cbb76b47a fix: stop spamming DERP map updates for equivalent maps (#11792)
Fixes 2 related issues:

1. wsconncache had incorrect logic to test whether to send DERPMap updates, sending if the maps were equivalent, instead of if they were _not equivalent_.
2. configmaps used a bugged check to test equality between DERPMaps, since it contains a map and the map entries are serialized in random order. Instead, we avoid comparing the protobufs and instead depend on the existing function that compares `tailcfg.DERPMap`. This also has the effect of reducing the number of times we convert to and from protobuf.
2024-01-24 16:27:15 +04:00
3e0e7f8739 feat: check agent API version on connection (#11696)
fixes #10531

Adds a check for `version` on connection to the Agent API websocket endpoint.  This is primarily for future-proofing, so that up-level agents get a sensible error if they connect to a back-level Coderd.

It also refactors the location of the `CurrentVersion` variables, to be part of the `proto` packages, since the versions refer to the APIs defined therein.
2024-01-23 14:27:49 +04:00
f01cab9894 feat: use tailnet v2 API for coordination (#11638)
This one is huge, and I'm sorry.

The problem is that once I change `tailnet.Conn` to start doing v2 behavior, I kind of have to change it everywhere, including in CoderSDK (CLI), the agent, wsproxy, and ServerTailnet.

There is still a bit more cleanup to do, and I need to add code so that when we lose connection to the Coordinator, we mark all peers as LOST, but that will be in a separate PR since this is big enough!
2024-01-22 11:07:50 +04:00
03ee63931c chore: remove duplicate validate calls on same oauth token (#11598)
* chore: remove duplicate validate calls on same oauth token
2024-01-12 14:27:22 -06:00
4d2fe2685a chore(coderd): extract api version validation to util package (#11407) 2024-01-05 10:22:07 +00:00
df3c310379 feat(cli): add coder open vscode (#11191)
Fixes #7667
2024-01-02 20:46:18 +02:00
c9b7d61769 chore: refactor agent connection updates (#11301)
Refactors the code that handles monitoring an agent websocket with pings and updating the connection times in the DB.

Consolidates v1 and v2 agent APIs under the same code for this.

One substantive change (not _just_ a refactor) is that I've made it so that we actually disconnect if the agent fails to respond to our pings, rather than the old behavior where we would update the database, but not actually tear down the websocket.
2024-01-02 16:04:37 +04:00
fe867d02e0 fix: correct perms for forbidden error in TemplateScheduleStore.Load (#11286)
* chore: TemplateScheduleStore.Load() throwing forbidden error
* fix: workspace agent scope to include template
2023-12-20 11:38:49 -06:00
e46431078c feat: add AgentAPI using DRPC (#10811)
Co-authored-by: Spike Curtis <spike@coder.com>
2023-12-18 22:53:28 +10:00
211e59bf65 feat: add tailnet v2 API support to coordinate endpoint (#11228)
closes #10532

Adds v2 support to the /coordinate endpoint via a query parameter.

v1 already has test cases, and we haven't implemented v2 at the client yet, so the only new test case is an unsupported version.
2023-12-15 14:10:24 +04:00
ad3fed72bc chore: rename Coordinator to CoordinatorV1 (#11222)
Renames the tailnet.Coordinator to represent both v1 and v2 APIs, so that we can use this interface for the main atomic pointer.

Part of #10532
2023-12-15 11:38:12 +04:00
b7bdb17460 feat: add metrics to workspace agent scripts (#11132)
* push startup script metrics to agent
2023-12-13 11:45:43 -06:00
dba0dfa859 chore: correct 500 -> 404 on workspace agent mw (#11129)
* chore: correct 500 -> 404
2023-12-12 15:14:32 -06:00
baf3bf6b9c feat: add workspace_id, owner_name to agent manifest (#10199)
Co-authored-by: Kyle Carberry <kyle@carberry.com>
Co-authored-by: Atif Ali <atif@coder.com>
2023-12-04 00:41:54 +03:00
6ecba0fda7 fix(coderd): prevent logging error for query cancellation in watchWorkspaceAgentMetadata (#10843) 2023-11-22 15:32:31 +00:00
51b58cfc98 fix: only update last_used_at when connection count > 0 (#10808) 2023-11-21 18:10:41 -06:00