Annoyingly, prometheus Registry collects metrics async, which is causing our test to be racy. They also don't export enough from the Metric interface for us to replicate a synchronous collect, so we have to use Eventually to test.
Adds prometheus metrics to PGPubsub for monitoring its health and performance in production.
Related to #11950 --- additional diagnostics to help figure out what's happening
Moves monitoring of the agent v2 API connection to the yamux layer.
Present behavior monitors this at the websocket layer, and closes the websocket on completion. This can cause yamux to hit unexpected errors since the connection is closed underneath it.
This might be the cause of yamux errors that some customers are seeing

In any case, it's more graceful to close yamux first and let yamux close the underlying websocket. That should limit yamux error logging to truly unexpected/error cases.
The only downside is that the yamux `Close()` doesn't accept a reason, so if the agent becomes outdated and we close the API connection, the agent just sees the connection close without a reason. I'm not sure we log this at the agent anyway, but it would be nice. I think more accurate logging on Coderd are more important.
I've also added some logging when the monitor disconnects for reasons other than the context being canceled (e.g. agent outdated, failed pings).
* fix: strip timezone information from a date in dau response
Timezone information is lost, so do not forward it to the client.
* fix: timezone offset should be flipped
* Make tests deterministic
This PR solves #10478 by auto-filling previously used template values in create and update workspace flows.
I decided against explicit user values in settings for these reasons:
* Autofill is far easier to implement
* Users benefit from autofill _by default_ — we don't need to teach them new concepts
* If we decide that autofill creates more harm than good, we can remove it without breaking compatibility
`agentsdk` depends on `agent/proto` because it needs to get the version to dial.
Therefore, the conversion routines need to live in `agentsdk` so that we can convert to and from the Manifest.
I briefly considered refactoring the agent to only reference `proto.Manifest`, but decided against it because we might have multiple protocol versions in the future, its useful to have a protocol-independent data structure.
Fixes#8218
Removes `wsconncache` and related "is legacy?" functions and API calls that were used by it.
The only leftover is that Agents still use the legacy IP, so that back level clients or workspace proxies can dial them correctly.
We should eventually remove this: #11819
This PR updates the Agent API to use the appearance.Fetcher, which is set by entitlement code in Enterprise coderd.
This brings the agentapi into compliance with the Enterprise feature.
The new Agent API needs an interface for ServiceBanners, so this PR creates it and refactors the AGPL and Enterprise code to achieve it.
Before we depended on the fact that the HTTP endpoint was missing to serve an empty ServiceBanner on AGPL deployments, but that won't work with dRPC, so we need a real interface to call.
#7439 added global caching of RBAC results.
Calls are cached based on hash(subject, object, action).
We often use dbauthz.AsSystemRestricted to handle "internal" authz calls, and these are often repeated with similar arguments and are likely to get cached.
So a transient error doing an authz check on a system function will be cached for up to a minute.
I'm just starting off with excluding context.Canceled but there's likely a whole suite of different errors we want to also exclude from the global cache.
- Adds column `favorite` to workspaces table
- Adds API endpoints to favorite/unfavorite workspaces
- Modifies sorting order to return owners' favorite workspaces first
Fixes 2 related issues:
1. wsconncache had incorrect logic to test whether to send DERPMap updates, sending if the maps were equivalent, instead of if they were _not equivalent_.
2. configmaps used a bugged check to test equality between DERPMaps, since it contains a map and the map entries are serialized in random order. Instead, we avoid comparing the protobufs and instead depend on the existing function that compares `tailcfg.DERPMap`. This also has the effect of reducing the number of times we convert to and from protobuf.
fixes#10531
Adds a check for `version` on connection to the Agent API websocket endpoint. This is primarily for future-proofing, so that up-level agents get a sensible error if they connect to a back-level Coderd.
It also refactors the location of the `CurrentVersion` variables, to be part of the `proto` packages, since the versions refer to the APIs defined therein.
Adds support to `ServerTailnet` to set all peers lost before attempting to reconnect to the coordinator. In practice, this only really affects `wsproxy` since coderd has a local connection to the coordinator that only goes down if we're shutting down or change licenses.
These will show up when configuring the application along with the
client ID and everything else. Should make it easier to configure the
application, otherwise you will have to go look up the URLs in the
docs (which are not yet written).
Co-authored-by: Steven Masley <stevenmasley@gmail.com>
Rather than passing all the deployment values. This is to make it
easier to generate API keys as part of the oauth flow.
I also added and fixed a test for when the lifetime is set and the
default and expiration are unset.
Co-authored-by: Steven Masley <stevenmasley@gmail.com>
We don't have visibility into some feature usage, so this adds a lot of fields missing from `database.Template` to `telemetry.Template`. Deprecation message is not collected, just whether it's set or not.
adds setAllPeersLost to the configMaps subcomponent of tailnet.Conn --- we'll call this when we disconnect from a coordinator so we'll eventually clean up peers if they disconnect while we are retrying the coordinator connection (or we don't succeed in reconnecting to the coordinator).
This one is huge, and I'm sorry.
The problem is that once I change `tailnet.Conn` to start doing v2 behavior, I kind of have to change it everywhere, including in CoderSDK (CLI), the agent, wsproxy, and ServerTailnet.
There is still a bit more cleanup to do, and I need to add code so that when we lose connection to the Coordinator, we mark all peers as LOST, but that will be in a separate PR since this is big enough!