Commit Graph

134 Commits

Author SHA1 Message Date
40802958e9 fix: use explicit api versions for agent and tailnet (#15508)
Bumps the Tailnet and Agent API version 2.3, and creates some extra controls and machinery around these versions.

What happened is that we accidentally shipped two new API features without bumping the version.  `ScriptCompleted` on the Agent API in Coder v2.16 and `RefreshResumeToken` on the Tailnet API in Coder v2.15.

Since we can't easily retroactively bump the versions, we'll roll these changes into API version 2.3 along with the new WorkspaceUpdates RPC, which hasn't been released yet.  That means there is some ambiguity in Coder v2.15-v2.17 about exactly what methods are supported on the Tailnet and Agent APIs.  This isn't great, but hasn't caused us major issues because 

1. RefreshResumeToken is considered optional, and clients just log and move on if the RPC isn't supported. 
2. Agents basically never get started talking to a Coderd that is older than they are, since the agent binary is normally downloaded from Coderd at workspace start.

Still it's good to get things squared away in terms of versions for SDK users and possible edge cases around client and server versions.

To mitigate against this thing happening again, this PR also:

1. adds a CODEOWNERS for the API proto packages, so I'll review changes
2. defines interface types for different API versions, and has the agent explicitly use a specific version.  That way, if you add a new method, and try to use it in the agent without thinking explicitly about versions, it won't compile.

With the protocol controllers stuff, we've sort of already abstracted the Tailnet API such that the interface type strategy won't work, but I'll work on getting the Controller to be version aware, such that it can check the API version it's getting against the controllers it has -- in a later PR.
2024-11-15 11:16:28 +04:00
b1298a3c1e feat: add WorkspaceUpdates tailnet RPC (#14847)
Closes #14716
Closes #14717

Adds a new user-scoped tailnet API endpoint (`api/v2/tailnet`) with a new RPC stream for receiving updates on workspaces owned by a specific user, as defined in #14716. 

When a stream is started, the `WorkspaceUpdatesProvider` will begin listening on the user-scoped pubsub events implemented in #14964. When a relevant event type is seen (such as a workspace state transition), the provider will query the DB for all the workspaces (and agents) owned by the user. This gets compared against the result of the previous query to produce a set of workspace updates. 

Workspace updates can be requested for any user ID, however only workspaces the authorised user is permitted to `ActionRead` will have their updates streamed.
Opening a tunnel to an agent requires that the user can perform `ActionSSH` against the workspace containing it.
2024-11-01 14:53:53 +11:00
cd890aa3a0 feat: enable key rotation (#15066)
This PR contains the remaining logic necessary to hook up key rotation
to the product.
2024-10-25 17:14:35 +01:00
343f8ec9ab chore: join owner, template, and org in new workspace view (#15116)
Joins in fields like `username`, `avatar_url`, `organization_name`,
`template_name` to `workspaces` via a **view**. 
The view must be maintained moving forward, but this prevents needing to
add RBAC permissions to fetch related workspace fields.
2024-10-22 09:20:54 -05:00
5bd19f8ba3 fix: fix flake in TestWorkspaceAgentClientCoordinate_ResumeToken (#14642)
fixes #14365

I bet what's going on is that in `connectToCoordinatorAndFetchResumeToken()` we call `Coordinate()`, send a message on the `Coordinate` client and then close it in rapid succession. We don't wait around for a response from the coordinator, so dRPC is likely aborting the call `Coordinate()` in the backend because the stream is closed before it even gets a chance.

Instead of using the Coordinator to record the peer ID assigned on the API call, we can wrap the resume token provider, since we call that API _and_ wait for a response. This also affords the opportunity to directly assert we get called with the right token.
2024-09-11 16:32:47 +04:00
e8c59a1d9d chore: avoid flake in resume token test (#14378) 2024-08-22 13:27:43 +10:00
cf8be4eac5 feat: add resume support to coordinator connections (#14234) 2024-08-20 17:16:49 +10:00
bf4b7abf14 chore(coderd): allow creating workspaces without specifying an organization (#14048) 2024-07-30 10:44:02 -06:00
ba7d1835e5 fix: fix flake in TestWorkspaceAgent_Metadata_CatchMemoryLeak (#13553)
Fixes flake seen here: https://github.com/coder/coder/actions/runs/9461246505/job/26061605278

#13486 subtly changes the test so that `post` uses the new v2 Agent API, and when canceling context, there is a race condition where the yamux session underpinning the API can get torn down before the RPC processes the canceled context, yielding a different error response than the test was previously expecting.

I've refactored the test to just stop posting when the test finishes, rather than depend on a context cancel to end the posting goroutine.
2024-06-12 18:33:22 +04:00
dd243686e4 chore!: remove deprecated agent v1 routes (#13486) 2024-06-11 12:22:59 +10:00
9d00a26a90 fix: add missing route for codersdk.PostLogSource (#13421) 2024-06-03 12:29:50 -05:00
5789ea5397 chore: move stat reporting into workspacestats package (#13386) 2024-05-29 11:49:08 -04:00
4d5a7b2d56 chore(codersdk): move all tailscale imports out of codersdk (#12735)
Currently, importing `codersdk` just to interact with the API requires
importing tailscale, which causes builds to fail unless manually using
our fork.
2024-03-26 12:44:31 -05:00
0723dd3abf fix: ensure agent token is from latest build in middleware (#12443) 2024-03-14 12:27:32 -04:00
74b749b890 chore(coderd): add test to assert agent token invalid when workspace deleted (#12290) 2024-02-26 13:27:00 +00:00
c0e169ebf9 feat: support custom order of agent metadata (#12066) 2024-02-08 17:29:34 +01:00
1cf4b62867 feat: change agent to use v2 API for reporting stats (#12024)
Modifies the agent to use the v2 API to report its statistics, using the `statsReporter` subcomponent.
2024-02-07 15:26:41 +04:00
1aa117b9ec chore: rename client Listen to ConnectRPC (#11916)
ConnectRPC seems more appropriate for this function
2024-02-01 14:44:11 +04:00
0fc177203e feat: use agent v2 API to update app health (#11889)
Use the Agent v2 API to update App Health
2024-01-30 11:35:12 +04:00
2599850e54 feat: use agent v2 API to post startup (#11877)
Uses the v2 Agent API to post startup information.
2024-01-30 11:23:28 +04:00
da8bb1c198 feat: use agent v2 API to fetch manifest (#11832)
Agent uses the v2 API to obtain the manifest, instead of the HTTP API.
2024-01-30 10:11:28 +04:00
d66e6e78ee fix: always attempt external auth refresh when fetching (#11762) (#11830)
* fix: always attempt external auth refresh when fetching
* refactor validate to check expiry when considering "valid"
2024-01-29 08:55:15 -06:00
79568bf628 Revert "fix: always attempt external auth refresh when fetching (#11762)"
This reverts commit 0befc0826a.
2024-01-25 14:22:47 -06:00
0befc0826a fix: always attempt external auth refresh when fetching (#11762)
* fix: always attempt external auth refresh when fetching
* refactor validate to check expiry when considering "valid"
2024-01-25 10:54:56 -06:00
f01cab9894 feat: use tailnet v2 API for coordination (#11638)
This one is huge, and I'm sorry.

The problem is that once I change `tailnet.Conn` to start doing v2 behavior, I kind of have to change it everywhere, including in CoderSDK (CLI), the agent, wsproxy, and ServerTailnet.

There is still a bit more cleanup to do, and I need to add code so that when we lose connection to the Coordinator, we mark all peers as LOST, but that will be in a separate PR since this is big enough!
2024-01-22 11:07:50 +04:00
80eac73ed1 chore: remove useLocalStorage hook (#11712) 2024-01-19 16:04:19 -07:00
5eb3e1cdaa feat: expose owner_name in coder_workspace resource (#11639) 2024-01-17 13:20:45 +01:00
d583acad00 fix(coderd): workspaceapps: update last_used_at when workspace app reports stats (#11603)
- Adds a new query BatchUpdateLastUsedAt
- Adds calls to BatchUpdateLastUsedAt in app stats handler upon flush
- Passes a stats flush channel to apptest setup scaffolding and updates unit tests to assert modifications to LastUsedAt.
2024-01-16 14:06:39 +00:00
03ee63931c chore: remove duplicate validate calls on same oauth token (#11598)
* chore: remove duplicate validate calls on same oauth token
2024-01-12 14:27:22 -06:00
211e59bf65 feat: add tailnet v2 API support to coordinate endpoint (#11228)
closes #10532

Adds v2 support to the /coordinate endpoint via a query parameter.

v1 already has test cases, and we haven't implemented v2 at the client yet, so the only new test case is an unsupported version.
2023-12-15 14:10:24 +04:00
43ba3146a9 feat: add test case for BlockDirect + listening ports (#11152)
Adds a test case for #10391 with single tailnet out of experimental
2023-12-13 12:28:09 +04:00
228cbec99b fix: stop updating agent stats from deleted workspaces (#11026)
Co-authored-by: Steven Masley <stevenmasley@gmail.com>
2023-12-07 13:55:29 -05:00
baf3bf6b9c feat: add workspace_id, owner_name to agent manifest (#10199)
Co-authored-by: Kyle Carberry <kyle@carberry.com>
Co-authored-by: Atif Ali <atif@coder.com>
2023-12-04 00:41:54 +03:00
967db2801b chore: refactor ResolveAutostart tests to use dbfake (#10603) 2023-11-30 19:33:04 -06:00
2dc565d5de chore: remove New----Builder from dbfake function names (#10882)
Drop "New" and "Builder" from the function names, in favor of the top-level resource created.  This shortens tests and gives a nice syntax.  Since everything is a builder, the prefix and suffix don't add much value and just make things harder to read.

I've also chosen to leave `Do()` as the function to insert into the database.  Even though it's a builder pattern, I fear `.Build()` might be confusing with Workspace Builds.  One other idea is `Insert()` but if we later add dbfake functions that update, this might be inconsistent.
2023-11-29 11:06:04 +04:00
f441ad66e1 fix(codersdk): keep workspace agent connection open after dial context (#10863) 2023-11-27 14:29:57 +02:00
4548ad7cef chore: remove dbfake.Workspace (#10880)
Remove dbfake.Workspace and use builder instead.
2023-11-27 14:39:16 +04:00
78283a7fb9 chore: remove dbfake.WorkspaceWithAgent (#10879)
Replace dbfake.WorkspaceWithAgent() with the builder pattern and remove this function.
2023-11-27 14:30:15 +04:00
6ecba0fda7 fix(coderd): prevent logging error for query cancellation in watchWorkspaceAgentMetadata (#10843) 2023-11-22 15:32:31 +00:00
a9c0c01629 chore: fix flake in listening ports test (#10833) 2023-11-22 09:30:51 +00:00
b25e5dc90b chore: remove dbfake.WorkspaceBuild in favor of builder pattern (#10814)
I'd like to convert dbfake into a builder pattern to prevent a proliferation of XXXWithYYY methods.  This is one step of the way by removing the Non-builder function.
2023-11-22 13:04:58 +04:00
51b58cfc98 fix: only update last_used_at when connection count > 0 (#10808) 2023-11-21 18:10:41 -06:00
198b56c137 fix(coderd): fix memory leak in watchWorkspaceAgentMetadata (#10685)
Fixes #10550
2023-11-16 17:03:53 +02:00
839a16e299 feat: add dbfake for workspace builds and resources (#10426)
* feat: add dbfakedata for workspace builds and resources

This creates `coderdtest.NewWithDatabase` and adds a series of
helper functions to `dbfake` that insert structured fake data
for resources into the database.

It allows us to remove provisionerd from a significant amount of
tests which should speed them up and reduce flakes.

* Rename dbfakedata to dbfake

* Migrate workspaceagents_test.go to use the new dbfake

* Migrate agent_test.go to use the new fakes

* Fix comments
2023-11-02 17:15:07 +00:00
a7c671ca07 feat: add workspace agent APIVersion (#10419)
Fixes #10339
2023-10-31 10:08:43 +04:00
4857d4bd55 feat(codersdk/agentsdk): use new agent metadata batch endpoint (#10224)
Part of #9782
2023-10-13 17:32:28 +03:00
8a47262faf fix: ignore logged errors in TestWorkspaceAgent/Timeout
fixes #10167

Annoyingly, there isn't a good way to stop the publish from being sent on shutdown, and subscribing to them in the test is too fragile because empty messages are sent in a bunch of places, so we can't reliably tell it's regarding timeouts.
2023-10-10 15:45:47 +04:00
f001a57614 fix: only allow promoting successful template versions (#9998) 2023-10-05 10:49:25 -06:00
2c2e98cc39 fix(coderd): fetch workspace agent scripts and log sources using system auth ctx (#10043)
* add failing unit test
* fetch log sources and agent scripts using system auth ctx
2023-10-04 15:50:51 +01:00
c194119689 chore: rename AwaitTemplateVersionJobCompleted and AwaitWorkspaceBuildJobCompleted (#10003) 2023-10-03 11:02:56 -06:00