The experimental functions in `golang.org/x/exp/slices` are now
available in the standard library since Go 1.21.
Reference: https://go.dev/doc/go1.21#slices
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
The work on CoderVPN required a new user-scoped `/tailnet` endpoint for
coordinating with multiple workspace agents, and receiving workspace
updates. Much like the `/coordinate` endpoint, this needs to respect the
`CODER_BROWSER_ONLY`/`--browser-only` deployment config value.
Fixes https://github.com/coder/coder/issues/16268
- Adds `/api/v2/workspaceagents/:id/containers` coderd endpoint that allows listing containers
visible to the agent. Optional filtering by labels is supported.
- Adds go tools to the `coder-dylib` CI step so we can generate mocks if needed
Migrates us to `coder/websocket` v1.8.12 rather than `nhooyr/websocket` on an older version.
Works around https://github.com/coder/websocket/issues/504 by adding an explicit test for `xerrors.Is(err, io.EOF)` where we were previously getting `io.EOF` from the netConn.
fixes#14881
Our handlers for streaming logs don't read from the websocket. We don't allow the client to send us any data, but the websocket library we use requires reading from the websocket to properly handle pings and closing. Not doing so can [can cause the websocket to hang on write](https://github.com/coder/websocket/issues/405), leaking go routines which were noticed in #14881.
This fixes the issue, and in process refactors our log streaming to a encoder/decoder package which provides generic types for sending JSON over websocket.
I'd also like for us to upgrade to the latest https://github.com/coder/websocket but we should also upgrade our tailscale fork before doing so to avoid including two copies of the websocket library.
Closes#14716Closes#14717
Adds a new user-scoped tailnet API endpoint (`api/v2/tailnet`) with a new RPC stream for receiving updates on workspaces owned by a specific user, as defined in #14716.
When a stream is started, the `WorkspaceUpdatesProvider` will begin listening on the user-scoped pubsub events implemented in #14964. When a relevant event type is seen (such as a workspace state transition), the provider will query the DB for all the workspaces (and agents) owned by the user. This gets compared against the result of the previous query to produce a set of workspace updates.
Workspace updates can be requested for any user ID, however only workspaces the authorised user is permitted to `ActionRead` will have their updates streamed.
Opening a tunnel to an agent requires that the user can perform `ActionSSH` against the workspace containing it.
We currently send empty payloads to pubsub channels of the form `workspace:<workspace_id>` to notify listeners of updates to workspaces (such as for refreshing the workspace dashboard).
To support https://github.com/coder/coder/issues/14716, we'll instead send `WorkspaceEvent` payloads to pubsub channels of the form `workspace_owner:<owner_id>`. This enables a listener to receive events for all workspaces owned by a user.
This PR replaces the usage of the old channels without modifying any existing behaviors.
```
type WorkspaceEvent struct {
Kind WorkspaceEventKind `json:"kind"`
WorkspaceID uuid.UUID `json:"workspace_id" format:"uuid"`
// AgentID is only set for WorkspaceEventKindAgent* events
// (excluding AgentTimeout)
AgentID *uuid.UUID `json:"agent_id,omitempty" format:"uuid"`
}
```
We've defined `WorkspaceEventKind`s based on how the old channel was used, but it's not yet necessary to inspect the types of any of the events, as the existing listeners are designed to fire off any of them.
```
WorkspaceEventKindStateChange WorkspaceEventKind = "state_change"
WorkspaceEventKindStatsUpdate WorkspaceEventKind = "stats_update"
WorkspaceEventKindMetadataUpdate WorkspaceEventKind = "mtd_update"
WorkspaceEventKindAppHealthUpdate WorkspaceEventKind = "app_health"
WorkspaceEventKindAgentLifecycleUpdate WorkspaceEventKind = "agt_lifecycle_update"
WorkspaceEventKindAgentLogsUpdate WorkspaceEventKind = "agt_logs_update"
WorkspaceEventKindAgentConnectionUpdate WorkspaceEventKind = "agt_connection_update"
WorkspaceEventKindAgentLogsOverflow WorkspaceEventKind = "agt_logs_overflow"
WorkspaceEventKindAgentTimeout WorkspaceEventKind = "agt_timeout"
```
Joins in fields like `username`, `avatar_url`, `organization_name`,
`template_name` to `workspaces` via a **view**.
The view must be maintained moving forward, but this prevents needing to
add RBAC permissions to fetch related workspace fields.
* feat: begin impl of agent script timings
* feat: add job_id and display_name to script timings
* fix: increment migration number
* fix: rename migrations from 251 to 254
* test: get tests compiling
* fix: appease the linter
* fix: get tests passing again
* fix: drop column from correct table
* test: add fixture for agent script timings
* fix: typo
* fix: use job id used in provisioner job timings
* fix: increment migration number
* test: behaviour of script runner
* test: rewrite test
* test: does exit 1 script break things?
* test: rewrite test again
* fix: revert change
Not sure how this came to be, I do not recall manually changing
these files.
* fix: let code breathe
* fix: wrap errors
* fix: justify nolint
* fix: swap require.Equal argument order
* fix: add mutex operations
* feat: add 'ran_on_start' and 'blocked_login' fields
* fix: update testdata fixture
* fix: refer to agent_id instead of job_id in timings
* fix: JobID -> AgentID in dbauthz_test
* fix: add 'id' to scripts, make timing refer to script id
* fix: fix broken tests and convert bug
* fix: update testdata fixtures
* fix: update testdata fixtures again
* feat: capture stage and if script timed out
* fix: update migration number
* test: add test for script api
* fix: fake db query
* fix: use UTC time
* fix: ensure r.scriptComplete is not nil
* fix: move err check to right after call
* fix: uppercase sql
* fix: use dbtime.Now()
* fix: debug log on r.scriptCompleted being nil
* fix: ensure correct rbac permissions
* chore: remove DisplayName
* fix: get tests passing
* fix: remove space in sql up
* docs: document ExecuteOption
* fix: drop 'RETURNING' from sql
* chore: remove 'display_name' from timing table
* fix: testdata fixture
* fix: put r.scriptCompleted call in goroutine
* fix: track goroutine for test + use separate context for reporting
* fix: appease linter, handle trackCommandGoroutine error
* fix: resolve race condition
* feat: replace timed_out column with status column
* test: update testdata fixture
* fix: apply suggestions from review
* revert: linter changes
fixes#14365
I bet what's going on is that in `connectToCoordinatorAndFetchResumeToken()` we call `Coordinate()`, send a message on the `Coordinate` client and then close it in rapid succession. We don't wait around for a response from the coordinator, so dRPC is likely aborting the call `Coordinate()` in the backend because the stream is closed before it even gets a chance.
Instead of using the Coordinator to record the peer ID assigned on the API call, we can wrap the resume token provider, since we call that API _and_ wait for a response. This also affords the opportunity to directly assert we get called with the right token.
Removes our pseudo rbac resources like `WorkspaceApplicationConnect` in favor of additional verbs like `ssh`. This is to make more intuitive permissions for building custom roles.
The source of truth is now `policy.go`
Just moved `rbac.Action` -> `policy.Action`. This is for the stacked PR to not have circular dependencies when doing autogen. Without this, the autogen can produce broken golang code, which prevents the autogen from compiling.
So just avoiding circular dependencies. Doing this in it's own PR to reduce LoC diffs in the primary PR, since this has 0 functional changes.
When starting a workspace, if the deadline crosses an autostart boundary, the deadline is set to autostart + TTL.
This copies the behavior in `ActivityBumpWorkspace`, but does not require activity.
Currently, importing `codersdk` just to interact with the API requires
importing tailscale, which causes builds to fail unless manually using
our fork.
`agentsdk` depends on `agent/proto` because it needs to get the version to dial.
Therefore, the conversion routines need to live in `agentsdk` so that we can convert to and from the Manifest.
I briefly considered refactoring the agent to only reference `proto.Manifest`, but decided against it because we might have multiple protocol versions in the future, its useful to have a protocol-independent data structure.
Fixes#8218
Removes `wsconncache` and related "is legacy?" functions and API calls that were used by it.
The only leftover is that Agents still use the legacy IP, so that back level clients or workspace proxies can dial them correctly.
We should eventually remove this: #11819
Fixes 2 related issues:
1. wsconncache had incorrect logic to test whether to send DERPMap updates, sending if the maps were equivalent, instead of if they were _not equivalent_.
2. configmaps used a bugged check to test equality between DERPMaps, since it contains a map and the map entries are serialized in random order. Instead, we avoid comparing the protobufs and instead depend on the existing function that compares `tailcfg.DERPMap`. This also has the effect of reducing the number of times we convert to and from protobuf.
fixes#10531
Adds a check for `version` on connection to the Agent API websocket endpoint. This is primarily for future-proofing, so that up-level agents get a sensible error if they connect to a back-level Coderd.
It also refactors the location of the `CurrentVersion` variables, to be part of the `proto` packages, since the versions refer to the APIs defined therein.
This one is huge, and I'm sorry.
The problem is that once I change `tailnet.Conn` to start doing v2 behavior, I kind of have to change it everywhere, including in CoderSDK (CLI), the agent, wsproxy, and ServerTailnet.
There is still a bit more cleanup to do, and I need to add code so that when we lose connection to the Coordinator, we mark all peers as LOST, but that will be in a separate PR since this is big enough!
Refactors the code that handles monitoring an agent websocket with pings and updating the connection times in the DB.
Consolidates v1 and v2 agent APIs under the same code for this.
One substantive change (not _just_ a refactor) is that I've made it so that we actually disconnect if the agent fails to respond to our pings, rather than the old behavior where we would update the database, but not actually tear down the websocket.
closes#10532
Adds v2 support to the /coordinate endpoint via a query parameter.
v1 already has test cases, and we haven't implemented v2 at the client yet, so the only new test case is an unsupported version.