Commit Graph

1707 Commits

Author SHA1 Message Date
07cccf9033 feat: disable directory listings for static files (#12229)
* feat: disable directory listings for static files

Static file server handles serving static asset files (js, css, etc).
The default file server would also list all files in a directory.
This has been changed to only serve files.
2024-02-20 15:50:30 -06:00
2dac34276a fix: add postgres triggers to remove deleted users from user_links (#12117)
* chore: add database test fixture to insert non-unique linked_ids
* chore: create unit test to exercise failed email change bug
* fix: add postgres triggers to keep user_links clear of deleted users
* Add migrations to prevent deleted users with links
* Force soft delete of users, do not allow un-delete
2024-02-20 13:19:38 -06:00
b342bd7869 feat: add port sharing frontend (#12119) 2024-02-20 13:26:34 -05:00
643c3ee54b refactor(provisionerd): move provisionersdk.VersionCurrent -> provisionerdproto.VersionCurrent (#12225) 2024-02-20 12:44:19 +00:00
a2cbb0f87f fix(enterprise/coderd): check provisionerd API version on connection (#12191) 2024-02-16 18:43:07 +00:00
f17149c59d feat: set groupsync to use default org (#12146)
* fix: assign new oauth users to default org

This is not a final solution, as we eventually want to be able
to map to different orgs. This makes it so multi-org does not break oauth/oidc.
2024-02-16 11:09:19 -06:00
75870c22ab fix: assign new oauth users to default org (#12145)
* fix: assign new oauth users to default org

This is not a final solution, as we eventually want to be able
to map to different orgs. This makes it so multi-org does not break oauth/oidc.
2024-02-16 08:47:26 -06:00
2a8004b1b2 feat: use default org for PostUser (#12143)
Instead of assuming only 1 org exists, this uses the
is_default org to place a user in if not specified.
2024-02-16 08:28:36 -06:00
2bf2f88b09 feat: implement 'is_default' org field (#12142)
The first organization created is now marked as "default". This is
to allow "single org" behavior as we move to a multi org codebase.

It is intentional that the user cannot change the default org at this
stage. Only 1 default org can exist, and it is always the first org.

Closes: https://github.com/coder/coder/issues/11961
2024-02-15 11:01:16 -06:00
5aa5ff1bde chore: deprecate API workspace build resources (#12167) 2024-02-15 17:13:44 +01:00
7a453608c9 feat: support order property of coder_agent (#12121) 2024-02-15 13:33:13 +01:00
2d0b9106c0 fix: change servertailnet to register the DERP dialer before setting DERP map (#12137)
I noticed a possible race where tailnet.Conn can try to dial the embedded region before we've set our custom dialer that send the DERP in-memory.  This closes that race and adds a test case for servertailnet with no STUN and an embedded relay
2024-02-15 10:51:12 +04:00
d6b025db14 Revert "feat: add activity status and autostop reason to workspace overview (#11987)" (#12144)
Related to https://github.com/coder/coder/pull/11987

This reverts commit d37b131.
2024-02-14 17:14:49 +00:00
04991f425a fix: set node callback each time we reinit the coordinator in servertailnet (#12140)
I think this will resolve #12136 but lets get a proper test at the system level before closing.

Before this change, we only register the node callback at start of day for the server tailnet.  If the coordinator changes, like we know happens when we are licensed for the PGCoordinator, we close the connection to the old coord, and open a new one to the new coord.

The callback is designed to direct the updates to the new coordinator, but there is nothing that specifically triggers it to fire after we connect to the new coordinator.

If we have STUN, then period re-STUNs will generally get it to fire eventually, but without STUN it we could go indefinitely without a callback.

This PR changes the servertailnet to re-register the callback each time we reconnect to the coordinator.  Registering a callback (even if it's the same callback) triggers an immediate call with our node information, so the new coordinator will have it.
2024-02-14 20:45:31 +04:00
5a0d240bc3 feat: expose DERP server debug metrics (#12135)
Adds some debug endpoints for looking into the DERP server.

The `api/v2/debug/derp/traffic` endpoint requires the `ss` utility to be present in order to function.  I have *not* added the `iproute2` package to our base image as it adds 11MB, so this endpoint won't be useful by default.  However, in a debugging situation, we could exec into the container and then `apk add iproute2`, or build a special debug image.

The `api/v2/debug/expvar` handler contains DERP metrics as well as commandline and memstats.

Example:

```
{
"alert_failed": 0,
"alert_generated": 0,
"cmdline": ["/Users/spike/repos/coder/build/coder_darwin_arm64","--global-config","/Users/spike/repos/coder/.coderv2","server","--http-address","0.0.0.0:3000","--swagger-enable","--access-url","http://127.0.0.1:3000","--dangerous-allow-cors-requests=true"],
"derp": {"accepts": 1, "average_queue_duration_ms": 0, "bytes_received": 0, "bytes_sent": 0, "counter_packets_dropped_reason": {"gone_disconnected": 0, "gone_not_here": 0, "queue_head": 0, "queue_tail": 0, "unknown_dest": 0, "unknown_dest_on_fwd": 0, "write_error": 0}, "counter_packets_dropped_type": {"disco": 0, "other": 0}, "counter_packets_received_kind": {"disco": 0, "other": 0}, "counter_tcp_rtt": {}, "counter_total_dup_client_conns": 0, "gauge_clients_local": 1, "gauge_clients_remote": 0, "gauge_clients_total": 1, "gauge_current_connections": 1, "gauge_current_dup_client_conns": 0, "gauge_current_dup_client_keys": 0, "gauge_current_file_descriptors": 0, "gauge_current_home_connections": 1, "gauge_memstats_sys0": 20874504, "gauge_watchers": 0, "got_ping": 0, "home_moves_in": 0, "home_moves_out": 0, "multiforwarder_created": 0, "multiforwarder_deleted": 0, "packet_forwarder_delete_other_value": 0, "packets_dropped": 0, "packets_forwarded_in": 0, "packets_forwarded_out": 0, "packets_received": 0, "packets_sent": 0, "peer_gone_disconnected_frames": 0, "peer_gone_not_here_frames": 0, "sent_pong": 0, "unknown_frames": 0, "version": "1.47.0-dev20240214-t64db8c604"},
"memstats": {"Alloc":286506256,"TotalAlloc":297594632,"Sys":310621512,"Lookups":0,"Mallocs":304204,"Frees":171570,"HeapAlloc":286506256,"HeapSys":294060032,"HeapIdle":3694592,"HeapInuse":290365440,"HeapReleased":3620864,"HeapObjects":132634,"StackInuse":3735552,"StackSys":3735552,"MSpanInuse":347256,"MSpanSys":358512,"MCacheInuse":9600,"MCacheSys":15600,"BuckHashSys":1469877,"GCSys":9434896,"OtherSys":1547043,"NextGC":551867656,"LastGC":1707892877408883000,"PauseTotalNs":1247000,"PauseNs":[200333,229375,239875,209542,106958,203792,57125,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],"PauseEnd":[1707892876217481000,1707892876219726000,1707892876222273000,1707892876226151000,1707892876234815000,1707892877398146000,1707892877408883000,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],"NumGC":7,"NumForcedGC":0,"GCCPUFraction":0.0022425810335762954,"EnableGC":true,"DebugGC":false,"BySize":[{"Size":0,"Mallocs":0,"Frees":0},{"Size":8,"Mallocs":14396,"Frees":9143},{"Size":16,"Mallocs":89090,"Frees":50507},{"Size":24,"Mallocs":40839,"Frees":24456},{"Size":32,"Mallocs":22404,"Frees":12379},{"Size":48,"Mallocs":51174,"Frees":23718},{"Size":64,"Mallocs":15406,"Frees":3501},{"Size":80,"Mallocs":6688,"Frees":2352},{"Size":96,"Mallocs":2567,"Frees":374},{"Size":112,"Mallocs":19371,"Frees":16883},{"Size":128,"Mallocs":2873,"Frees":1061},{"Size":144,"Mallocs":5600,"Frees":2742},{"Size":160,"Mallocs":2159,"Frees":622},{"Size":176,"Mallocs":454,"Frees":86},{"Size":192,"Mallocs":227,"Frees":128},{"Size":208,"Mallocs":1407,"Frees":732},{"Size":224,"Mallocs":1365,"Frees":1090},{"Size":240,"Mallocs":82,"Frees":48},{"Size":256,"Mallocs":310,"Frees":162},{"Size":288,"Mallocs":1945,"Frees":562},{"Size":320,"Mallocs":1200,"Frees":458},{"Size":352,"Mallocs":133,"Frees":33},{"Size":384,"Mallocs":582,"Frees":51},{"Size":416,"Mallocs":747,"Frees":200},{"Size":448,"Mallocs":113,"Frees":22},{"Size":480,"Mallocs":34,"Frees":21},{"Size":512,"Mallocs":951,"Frees":91},{"Size":576,"Mallocs":364,"Frees":122},{"Size":640,"Mallocs":532,"Frees":270},{"Size":704,"Mallocs":93,"Frees":39},{"Size":768,"Mallocs":83,"Frees":35},{"Size":896,"Mallocs":308,"Frees":175},{"Size":1024,"Mallocs":226,"Frees":122},{"Size":1152,"Mallocs":198,"Frees":100},{"Size":1280,"Mallocs":314,"Frees":171},{"Size":1408,"Mallocs":77,"Frees":47},{"Size":1536,"Mallocs":80,"Frees":54},{"Size":1792,"Mallocs":199,"Frees":107},{"Size":2048,"Mallocs":112,"Frees":48},{"Size":2304,"Mallocs":71,"Frees":32},{"Size":2688,"Mallocs":206,"Frees":81},{"Size":3072,"Mallocs":39,"Frees":15},{"Size":3200,"Mallocs":16,"Frees":7},{"Size":3456,"Mallocs":44,"Frees":29},{"Size":4096,"Mallocs":192,"Frees":83},{"Size":4864,"Mallocs":44,"Frees":25},{"Size":5376,"Mallocs":105,"Frees":43},{"Size":6144,"Mallocs":25,"Frees":5},{"Size":6528,"Mallocs":22,"Frees":7},{"Size":6784,"Mallocs":3,"Frees":0},{"Size":6912,"Mallocs":4,"Frees":2},{"Size":8192,"Mallocs":59,"Frees":10},{"Size":9472,"Mallocs":31,"Frees":12},{"Size":9728,"Mallocs":5,"Frees":2},{"Size":10240,"Mallocs":5,"Frees":0},{"Size":10880,"Mallocs":27,"Frees":11},{"Size":12288,"Mallocs":4,"Frees":1},{"Size":13568,"Mallocs":4,"Frees":2},{"Size":14336,"Mallocs":9,"Frees":2},{"Size":16384,"Mallocs":10,"Frees":2},{"Size":18432,"Mallocs":4,"Frees":2}]},
"warning_failed": 0,
"warning_generated": 0
}
```

If we find the DERP metrics useful we could consider how to include them in Prometheus scrapes based on the tailnet `varz` package.  That's for a later PR if at all.
2024-02-14 15:11:45 +04:00
5d483a7ea1 fix: do not query user_link for deleted accounts (#12112) 2024-02-13 13:02:21 -06:00
06f3ab1206 chore: add database test fixture to insert non-unique linked_ids (#12111)
* chore: add database test fixture to insert non-unique linked_ids
2024-02-13 12:06:47 -06:00
d37b131426 feat: add activity status and autostop reason to workspace overview (#11987) 2024-02-13 10:50:17 -07:00
3ab3a62bef feat: add port-sharing backend (#11939) 2024-02-13 09:31:20 -05:00
e1e352d8c1 feat: add template activity_bump property (#11734)
Allows template admins to configure the activity bump duration. Defaults to 1h.
2024-02-13 07:00:35 +00:00
fead57f304 fix: allow access to unhealthy/initializing apps (#12086) 2024-02-13 16:30:49 +10:00
3e68650791 feat: support order property of coder_app resource (#12077) 2024-02-12 15:11:31 +01:00
92b2e26a48 feat: send log limit exceeded in response, not error (#12078)
When we exceed the db-imposed limit of logs, we need to communicate that back to the agent.  In v1 we did it with a 4xx-level HTTP status, but with dRPC, the errors are delivered as strings, which feels fragile to me for something we want to gracefully handle.

So, this PR adds the log limit exceeded as a field on the response message, and fixes the API handler to set it as appropriate instead of an error.
2024-02-09 16:17:20 +04:00
1f5a6d59ba chore: consolidate websocketNetConn implementations (#12065)
Consolidates websocketNetConn from multiple packages in favor of a central one in codersdk
2024-02-09 11:39:08 +04:00
ec8e41f516 chore: add logging around agent app health reporting (#12071) 2024-02-08 23:37:44 -06:00
c0e169ebf9 feat: support custom order of agent metadata (#12066) 2024-02-08 17:29:34 +01:00
151aaadc23 fix: allow startup scripts larger than 32k (#12060)
Fixes #12057 and adds a regression test.
2024-02-07 22:26:42 +04:00
1abe0cfa1a docs: fix /audit & /insights params (#12043) 2024-02-07 08:38:54 -05:00
1cf4b62867 feat: change agent to use v2 API for reporting stats (#12024)
Modifies the agent to use the v2 API to report its statistics, using the `statsReporter` subcomponent.
2024-02-07 15:26:41 +04:00
213ae69bee fix: start timer before subscribing to avoid test race (#12031)
Fixes #12030

This is a good example of the kind of thing I'd like to address with a time-testing lib.  The problem is that there is a race between the watchdog starting it's timer and the test incrementing the time.  What would make this easier is if the time-testing library could wait for and assert the call to start the timer before incrementing the time.
2024-02-06 20:21:23 +04:00
98b86f3cd6 chore: add logs to pq notification dialer (#12020) 2024-02-06 15:21:48 +00:00
e09cd2c6bd feat: add watchdog to pubsub (#12011)
adds a watchdog to our pubsub and runs it for Coder server.

If the watchdog times out, it triggers a graceful exit in `coder server` to give any provisioner jobs a chance to shut down.

c.f. #11950
2024-02-06 16:58:45 +04:00
c7f52b73bb feat(coderd): add prometheus metrics to servertailnet (#11988) 2024-02-05 23:57:18 -06:00
c84a637116 fix: stop logging error on query canceled (#12017)
Fixes flake seen here: https://github.com/coder/coder/actions/runs/7782340530/job/21218566449
2024-02-06 08:43:34 +04:00
ad8e0db172 feat: add custom error message on signups disabled page (#11959) 2024-02-01 18:01:25 +01:00
79d5c238cc fix: always return a clean http client for promoauth (#11963)
* fix: add unit test to verify default client is not broken

* always return a clean http client
* No need to clone the tripper
2024-02-01 11:13:34 -05:00
1aa117b9ec chore: rename client Listen to ConnectRPC (#11916)
ConnectRPC seems more appropriate for this function
2024-02-01 14:44:11 +04:00
d5a98cc6d7 fix: avoid race in TestPGPubsub_Metrics by using Eventually (#11973)
Annoyingly, prometheus Registry collects metrics async, which is causing our test to be racy.  They also don't export enough from the Metric interface for us to replicate a synchronous collect, so we have to use Eventually to test.
2024-02-01 12:10:19 +04:00
5a359d50dd feat: add metrics to PGPubsub (#11971)
Adds prometheus metrics to PGPubsub for monitoring its health and performance in production.

Related to #11950 --- additional diagnostics to help figure out what's happening
2024-02-01 11:25:03 +04:00
3ace7982aa fix: rewrite url to agent ip in single tailnet (#11810)
This restores previous behavior of being able to cache connections
across agents in single tailnet.
2024-02-01 00:25:52 -06:00
4ed1f5581a chore(coderd): add logging to agent rpc yamux conn (#11965) 2024-01-31 23:17:20 -06:00
b79785c86f feat: move agent v2 API connection monitoring to yamux layer (#11910)
Moves monitoring of the agent v2 API connection to the yamux layer.

Present behavior monitors this at the websocket layer, and closes the websocket on completion. This can cause yamux to hit unexpected errors since the connection is closed underneath it.

This might be the cause of yamux errors that some customers are seeing

![image.png](https://graphite-user-uploaded-assets-prod.s3.amazonaws.com/tCz4CxRU9jhAJ7zH8RTi/53b8b5ef-e9e5-44a5-b559-99c37c136071.png)

In any case, it's more graceful to close yamux first and let yamux close the underlying websocket.  That should limit yamux error logging to truly unexpected/error cases.

The only downside is that the yamux `Close()` doesn't accept a reason, so if the agent becomes outdated and we close the API connection, the agent just sees the connection close without a reason.  I'm not sure we log this at the agent anyway, but it would be nice.  I think more accurate logging on Coderd are more important.

I've also added some logging when the monitor disconnects for reasons other than the context being canceled (e.g. agent outdated, failed pings).
2024-02-01 08:18:35 +04:00
ac64155282 fix: strip timezone information from a date in dau response (#11962)
* fix: strip timezone information from a date in dau response

Timezone information is lost, so do not forward it to the client.

* fix: timezone offset should be flipped
* Make tests deterministic
2024-01-31 16:01:50 -06:00
13cbca679e feat: support template bundles as zip archives (#11839) 2024-01-31 14:49:55 +01:00
b25deaae20 fix(coderd/database): fix limit in GetUserWorkspaceBuildParameters (#11954) 2024-01-31 13:56:36 +02:00
a34cada09a feat: add logging to pgPubsub (#11953)
Should be helpful for #11950

Adds a logger to pgPubsub and logs various events, most especially connection and disconnection from postgres.
2024-01-31 15:49:16 +04:00
0c30dde9b5 feat: add customizable upgrade message on client/server version mismatch (#11587) 2024-01-30 17:11:37 -06:00
adbb025e74 feat: add user-level parameter autofill (#11731)
This PR solves #10478 by auto-filling previously used template values in create and update workspace flows.

I decided against explicit user values in settings for these reasons:

* Autofill is far easier to implement
* Users benefit from autofill _by default_ — we don't need to teach them new concepts
* If we decide that autofill creates more harm than good, we can remove it without breaking compatibility
2024-01-30 16:02:21 -06:00
2fd1a726aa fix: only delete expired agents on success (#11940) 2024-01-30 14:11:45 -06:00
27f3b7a814 fix: add timeout to listening ports request (#11935)
This can potentially hang for 15m if the agent is unreachable.
2024-01-30 13:53:52 -06:00