Commit Graph

2070 Commits

Author SHA1 Message Date
d5a98cc6d7 fix: avoid race in TestPGPubsub_Metrics by using Eventually (#11973)
Annoyingly, prometheus Registry collects metrics async, which is causing our test to be racy.  They also don't export enough from the Metric interface for us to replicate a synchronous collect, so we have to use Eventually to test.
2024-02-01 12:10:19 +04:00
5a359d50dd feat: add metrics to PGPubsub (#11971)
Adds prometheus metrics to PGPubsub for monitoring its health and performance in production.

Related to #11950 --- additional diagnostics to help figure out what's happening
2024-02-01 11:25:03 +04:00
3ace7982aa fix: rewrite url to agent ip in single tailnet (#11810)
This restores previous behavior of being able to cache connections
across agents in single tailnet.
2024-02-01 00:25:52 -06:00
4ed1f5581a chore(coderd): add logging to agent rpc yamux conn (#11965) 2024-01-31 23:17:20 -06:00
b79785c86f feat: move agent v2 API connection monitoring to yamux layer (#11910)
Moves monitoring of the agent v2 API connection to the yamux layer.

Present behavior monitors this at the websocket layer, and closes the websocket on completion. This can cause yamux to hit unexpected errors since the connection is closed underneath it.

This might be the cause of yamux errors that some customers are seeing

![image.png](https://graphite-user-uploaded-assets-prod.s3.amazonaws.com/tCz4CxRU9jhAJ7zH8RTi/53b8b5ef-e9e5-44a5-b559-99c37c136071.png)

In any case, it's more graceful to close yamux first and let yamux close the underlying websocket.  That should limit yamux error logging to truly unexpected/error cases.

The only downside is that the yamux `Close()` doesn't accept a reason, so if the agent becomes outdated and we close the API connection, the agent just sees the connection close without a reason.  I'm not sure we log this at the agent anyway, but it would be nice.  I think more accurate logging on Coderd are more important.

I've also added some logging when the monitor disconnects for reasons other than the context being canceled (e.g. agent outdated, failed pings).
2024-02-01 08:18:35 +04:00
ac64155282 fix: strip timezone information from a date in dau response (#11962)
* fix: strip timezone information from a date in dau response

Timezone information is lost, so do not forward it to the client.

* fix: timezone offset should be flipped
* Make tests deterministic
2024-01-31 16:01:50 -06:00
13cbca679e feat: support template bundles as zip archives (#11839) 2024-01-31 14:49:55 +01:00
b25deaae20 fix(coderd/database): fix limit in GetUserWorkspaceBuildParameters (#11954) 2024-01-31 13:56:36 +02:00
a34cada09a feat: add logging to pgPubsub (#11953)
Should be helpful for #11950

Adds a logger to pgPubsub and logs various events, most especially connection and disconnection from postgres.
2024-01-31 15:49:16 +04:00
0c30dde9b5 feat: add customizable upgrade message on client/server version mismatch (#11587) 2024-01-30 17:11:37 -06:00
adbb025e74 feat: add user-level parameter autofill (#11731)
This PR solves #10478 by auto-filling previously used template values in create and update workspace flows.

I decided against explicit user values in settings for these reasons:

* Autofill is far easier to implement
* Users benefit from autofill _by default_ — we don't need to teach them new concepts
* If we decide that autofill creates more harm than good, we can remove it without breaking compatibility
2024-01-30 16:02:21 -06:00
2fd1a726aa fix: only delete expired agents on success (#11940) 2024-01-30 14:11:45 -06:00
27f3b7a814 fix: add timeout to listening ports request (#11935)
This can potentially hang for 15m if the agent is unreachable.
2024-01-30 13:53:52 -06:00
dcab6fa5a4 feat(site): display user avatar (#11893)
* add owner API to workspace and workspace build responses
* display user avatar in workspace top bar

Co-authored-by: Cian Johnston <cian@coder.com>
2024-01-30 17:07:06 +00:00
0fc177203e feat: use agent v2 API to update app health (#11889)
Use the Agent v2 API to update App Health
2024-01-30 11:35:12 +04:00
2599850e54 feat: use agent v2 API to post startup (#11877)
Uses the v2 Agent API to post startup information.
2024-01-30 11:23:28 +04:00
da8bb1c198 feat: use agent v2 API to fetch manifest (#11832)
Agent uses the v2 API to obtain the manifest, instead of the HTTP API.
2024-01-30 10:11:28 +04:00
0eff646c31 chore: move proto to sdk conversion to agentsdk (#11831)
`agentsdk` depends on `agent/proto` because it needs to get the version to dial.

Therefore, the conversion routines need to live in `agentsdk` so that we can convert to and from the Manifest.

I briefly considered refactoring the agent to only reference `proto.Manifest`, but decided against it because we might have multiple protocol versions in the future, its useful to have a protocol-independent data structure.
2024-01-30 09:04:56 +04:00
1e8a9c09fe chore: remove legacy wsconncache (#11816)
Fixes #8218

Removes `wsconncache` and related "is legacy?" functions and API calls that were used by it.

The only leftover is that Agents still use the legacy IP, so that back level clients or workspace proxies can dial them correctly.

We should eventually remove this: #11819
2024-01-30 07:56:36 +04:00
13e24f21e4 feat: use Agent v2 API for Service Banner (#11806)
Agent uses the v2 API for the service banner, rather than the v1 HTTP API.

One of several for #10534
2024-01-30 07:44:47 +04:00
4f5a2f0a9b feat: add backend for jfrog xray support (#11829) 2024-01-29 19:30:02 -06:00
207328ca50 feat: use appearance.Fetcher in agentapi (#11770)
This PR updates the Agent API to use the appearance.Fetcher, which is set by entitlement code in Enterprise coderd.

This brings the agentapi into compliance with the Enterprise feature.
2024-01-29 21:22:50 +04:00
b2bc3fff33 fix: wait for new template version before promoting (#11874)
Fixes a test flake due to not waiting for the correct template version prior to promoting it.
2024-01-29 19:29:56 +04:00
04a23261e6 chore: ensure github uids are unique (#11826) 2024-01-29 09:13:46 -06:00
d66e6e78ee fix: always attempt external auth refresh when fetching (#11762) (#11830)
* fix: always attempt external auth refresh when fetching
* refactor validate to check expiry when considering "valid"
2024-01-29 08:55:15 -06:00
bc4ae53261 chore: refactor Appearance to an interface callable by AGPL code (#11769)
The new Agent API needs an interface for ServiceBanners, so this PR creates it and refactors the AGPL and Enterprise code to achieve it.

Before we depended on the fact that the HTTP endpoint was missing to serve an empty ServiceBanner on AGPL deployments, but that won't work with dRPC, so we need a real interface to call.
2024-01-29 12:17:31 +04:00
aacb4a2b4c feat: use map instead of slice in metrics aggregator (#11815) 2024-01-29 09:12:41 +01:00
42e997d39e fix(coderd/rbac): do not cache context cancellation errors (#11840)
#7439 added global caching of RBAC results.
Calls are cached based on hash(subject, object, action).
We often use dbauthz.AsSystemRestricted to handle "internal" authz calls, and these are often repeated with similar arguments and are likely to get cached.
So a transient error doing an authz check on a system function will be cached for up to a minute.
I'm just starting off with excluding context.Canceled but there's likely a whole suite of different errors we want to also exclude from the global cache.
2024-01-26 16:19:55 +00:00
29707099d7 chore: add agentapi tests (#11269) 2024-01-26 07:04:19 +00:00
005c014f13 chore: instrument additional github api calls (#11824)
* chore: instrument additional githubapi calls

This only affects github as a login source, not external auth.
2024-01-25 18:34:46 -06:00
79568bf628 Revert "fix: always attempt external auth refresh when fetching (#11762)"
This reverts commit 0befc0826a.
2024-01-25 14:22:47 -06:00
0befc0826a fix: always attempt external auth refresh when fetching (#11762)
* fix: always attempt external auth refresh when fetching
* refactor validate to check expiry when considering "valid"
2024-01-25 10:54:56 -06:00
8eae4f83bf fix(coderd/provisionerdserver): fix test flake in TestHeartbeat (#11808) 2024-01-25 12:05:57 +00:00
4616ccf462 fix(coderd): alter return signature of convertWorkspace, add check for requesterID (#11796) 2024-01-24 14:13:14 +00:00
f92336c4d5 feat(coderd): allow workspace owners to mark workspaces as favorite (#11791)
- Adds column `favorite` to workspaces table
- Adds API endpoints to favorite/unfavorite workspaces
- Modifies sorting order to return owners' favorite workspaces first
2024-01-24 13:39:19 +00:00
5cbb76b47a fix: stop spamming DERP map updates for equivalent maps (#11792)
Fixes 2 related issues:

1. wsconncache had incorrect logic to test whether to send DERPMap updates, sending if the maps were equivalent, instead of if they were _not equivalent_.
2. configmaps used a bugged check to test equality between DERPMaps, since it contains a map and the map entries are serialized in random order. Instead, we avoid comparing the protobufs and instead depend on the existing function that compares `tailcfg.DERPMap`. This also has the effect of reducing the number of times we convert to and from protobuf.
2024-01-24 16:27:15 +04:00
f5dbc718a7 fix: accept agent RPC connection without version query parameter (#11790)
Fixes an issue where Coder v2.7.1 agents connect to /api/v2/workspaceagents/me/rpc without a version query parameter
2024-01-24 09:10:16 +04:00
13beb04521 fix: disable keepalives in workspaceapps transport (#11789)
Connection caching causes requests to hit the wrong workspaces. See
comment.

Fixes https://github.com/coder/coder/issues/11767
2024-01-24 14:46:59 +10:00
383eed93f8 fix: use correct logger for lifecycle_executor (#11763) 2024-01-23 14:33:55 -06:00
d6ba0dfecb feat: add "updated" search param to workspaces (#11714)
* feat: add "updated" search param to workspaces
* rego -> sql needs to specify which <table>.organization_id
2024-01-23 11:52:06 -06:00
081fbef097 fix: code-server path based forwarding, defer to code-server (#11759)
Do not attempt to construct a path based port forward url.
Always defer to code server, as it has it's own proxy method.
2024-01-23 11:36:44 -06:00
059e533544 feat: agent uses Tailnet v2 API for DERPMap updates (#11698)
Switches the Agent to use Tailnet v2 API to get DERPMap updates.

Subsequent PRs will do the same for the CLI (`codersdk`) and `wsproxy`.
2024-01-23 14:42:07 +04:00
3e0e7f8739 feat: check agent API version on connection (#11696)
fixes #10531

Adds a check for `version` on connection to the Agent API websocket endpoint.  This is primarily for future-proofing, so that up-level agents get a sensible error if they connect to a back-level Coderd.

It also refactors the location of the `CurrentVersion` variables, to be part of the `proto` packages, since the versions refer to the APIs defined therein.
2024-01-23 14:27:49 +04:00
eb12fd7d92 feat: make ServerTailnet set peers lost when it reconnects to the coordinator (#11682)
Adds support to `ServerTailnet` to set all peers lost before attempting to reconnect to the coordinator. In practice, this only really affects `wsproxy` since coderd has a local connection to the coordinator that only goes down if we're shutting down or change licenses.
2024-01-23 13:17:56 +04:00
3014777d2a feat: add endpoints to oauth2 provider applications (#11718)
These will show up when configuring the application along with the
client ID and everything else.  Should make it easier to configure the
application, otherwise you will have to go look up the URLs in the
docs (which are not yet written).

Co-authored-by: Steven Masley <stevenmasley@gmail.com>
2024-01-22 13:25:25 -09:00
8e0a153725 chore: implement device auth flow for fake idp (#11707)
* chore: implement device auth flow for fake idp
2024-01-22 20:46:05 +00:00
16c6cefde8 chore: pass lifetime directly into api key generate (#11715)
Rather than passing all the deployment values.  This is to make it
easier to generate API keys as part of the oauth flow.

I also added and fixed a test for when the lifetime is set and the
default and expiration are unset.

Co-authored-by: Steven Masley <stevenmasley@gmail.com>
2024-01-22 11:42:55 -09:00
15a90f028e chore: collect more template telemetry to gauge feature usage
We don't have visibility into some feature usage, so this adds a lot of fields missing from `database.Template` to `telemetry.Template`. Deprecation message is not collected, just whether it's set or not.
2024-01-22 18:55:27 +10:00
b7b936547d feat: add setAllPeersLost to the configMaps subcomponent (#11665)
adds setAllPeersLost to the configMaps subcomponent of tailnet.Conn --- we'll call this when we disconnect from a coordinator so we'll eventually clean up peers if they disconnect while we are retrying the coordinator connection (or we don't succeed in reconnecting to the coordinator).
2024-01-22 12:12:15 +04:00
f01cab9894 feat: use tailnet v2 API for coordination (#11638)
This one is huge, and I'm sorry.

The problem is that once I change `tailnet.Conn` to start doing v2 behavior, I kind of have to change it everywhere, including in CoderSDK (CLI), the agent, wsproxy, and ServerTailnet.

There is still a bit more cleanup to do, and I need to add code so that when we lose connection to the Coordinator, we mark all peers as LOST, but that will be in a separate PR since this is big enough!
2024-01-22 11:07:50 +04:00