Commit Graph

13 Commits

Author SHA1 Message Date
623dcd97dc fix(cli): fix flakes related to context cancellation when establishing pg connections (#18246)
Since https://github.com/coder/coder/pull/18195 was merged, we started
running CLI tests with postgres instead of just dbmem. This surfaced
errors related to context cancellation while establishing postgres
connections.

This PR should fix https://github.com/coder/internal/issues/672. Related
to https://github.com/coder/coder/issues/15109.
2025-06-05 15:54:13 +02:00
5f3a53f01b fix: fix race condition in pubsub startup (#17088)
fixes https://github.com/coder/internal/issues/525

If the context is canceled, the goroutine that is supposed to read from the `errCh` could exit prematurely, leading to a goroutine leak. Refactors this code so it cannot block.
2025-03-25 16:45:45 +04:00
005ea536a5 fix: fix Listen/Unlisten race on Pubsub (#15315)
Fixes #15312

When we need to `Unlisten()` for an event, instead of immediately removing the event from the `p.queues`, we store a channel to signal any goroutines trying to Subscribe to the same event when we are done. On `Subscribe`, if the channel is present, wait for it before calling `Listen` to ensure the ordering is correct.
2024-11-01 14:35:26 +04:00
bd9151d224 fix(coderd/database/pubsub): prevent listeners read outside mutex lock (#15303)
https://github.com/coder/coder/actions/runs/11611105362/job/32331771969#logs

```
2024-10-31T11:36:45.9225038Z WARNING: DATA RACE
2024-10-31T11:36:45.9225120Z Write at 0x00c0000d8030 by goroutine 26:
2024-10-31T11:36:45.9225200Z   runtime.mapdelete()
2024-10-31T11:36:45.9225412Z       /opt/hostedtoolcache/go/1.22.8/x64/src/runtime/map.go:696 +0x0
2024-10-31T11:36:45.9225647Z   github.com/coder/coder/v2/coderd/database/pubsub.(*PGPubsub).subscribeQueue.func2()
2024-10-31T11:36:45.9225906Z       /home/runner/work/coder/coder/coderd/database/pubsub/pubsub.go:277 +0x131
2024-10-31T11:36:45.9225993Z   runtime.deferreturn()
2024-10-31T11:36:45.9226210Z       /opt/hostedtoolcache/go/1.22.8/x64/src/runtime/panic.go:602 +0x5d
2024-10-31T11:36:45.9226283Z   testing.tRunner()
2024-10-31T11:36:45.9226519Z       /opt/hostedtoolcache/go/1.22.8/x64/src/testing/testing.go:1689 +0x21e
2024-10-31T11:36:45.9226603Z   testing.(*T).Run.gowrap1()
2024-10-31T11:36:45.9226831Z       /opt/hostedtoolcache/go/1.22.8/x64/src/testing/testing.go:1742 +0x44
2024-10-31T11:36:45.9226836Z 
2024-10-31T11:36:45.9226934Z Previous read at 0x00c0000d8030 by goroutine 112:
2024-10-31T11:36:45.9227159Z   github.com/coder/coder/v2/coderd/database/pubsub.(*PGPubsub).subscribeQueue.func2()
2024-10-31T11:36:45.9227462Z       /home/runner/work/coder/coder/coderd/database/pubsub/pubsub.go:284 +0x1b6
2024-10-31T11:36:45.9227661Z   github.com/coder/coder/v2/enterprise/replicasync.(*Manager).subscribe.func3()
2024-10-31T11:36:45.9227936Z       /home/runner/work/coder/coder/enterprise/replicasync/replicasync.go:228 +0x53
2024-10-31T11:36:45.9227941Z 
2024-10-31T11:36:45.9228019Z Goroutine 26 (running) created at:
2024-10-31T11:36:45.9228096Z   testing.(*T).Run()
2024-10-31T11:36:45.9228318Z       /opt/hostedtoolcache/go/1.22.8/x64/src/testing/testing.go:1742 +0x825
2024-10-31T11:36:45.9228498Z   github.com/coder/coder/v2/enterprise/replicasync_test.TestReplica()
2024-10-31T11:36:45.9228777Z       /home/runner/work/coder/coder/enterprise/replicasync/replicasync_test.go:33 +0x4b
2024-10-31T11:36:45.9228847Z   testing.tRunner()
2024-10-31T11:36:45.9229063Z       /opt/hostedtoolcache/go/1.22.8/x64/src/testing/testing.go:1689 +0x21e
2024-10-31T11:36:45.9229142Z   testing.(*T).Run.gowrap1()
2024-10-31T11:36:45.9229366Z       /opt/hostedtoolcache/go/1.22.8/x64/src/testing/testing.go:1742 +0x44
2024-10-31T11:36:45.9229369Z 
2024-10-31T11:36:45.9229443Z Goroutine 112 (finished) created at:
2024-10-31T11:36:45.9229685Z   github.com/coder/coder/v2/enterprise/replicasync.(*Manager).subscribe()
2024-10-31T11:36:45.9229952Z       /home/runner/work/coder/coder/enterprise/replicasync/replicasync.go:226 +0x568
2024-10-31T11:36:45.9230092Z   github.com/coder/coder/v2/enterprise/replicasync.New()
2024-10-31T11:36:45.9230361Z       /home/runner/work/coder/coder/enterprise/replicasync/replicasync.go:101 +0x1344
2024-10-31T11:36:45.9230547Z   github.com/coder/coder/v2/enterprise/replicasync_test.TestReplica.func1()
2024-10-31T11:36:45.9230836Z       /home/runner/work/coder/coder/enterprise/replicasync/replicasync_test.go:48 +0x26a
2024-10-31T11:36:45.9230904Z   testing.tRunner()
2024-10-31T11:36:45.9231127Z       /opt/hostedtoolcache/go/1.22.8/x64/src/testing/testing.go:1689 +0x21e
2024-10-31T11:36:45.9231207Z   testing.(*T).Run.gowrap1()
2024-10-31T11:36:45.9231431Z       /opt/hostedtoolcache/go/1.22.8/x64/src/testing/testing.go:1742 +0x44
```
2024-11-01 11:24:29 +04:00
b8944074c4 chore: improve coder server ux (#14761) 2024-09-24 13:16:36 +10:00
ded612d3ec fix: use authenticated urls for pubsub (#14261) 2024-08-26 15:04:04 +00:00
4671ebb330 feat: measure pubsub latencies and expose metrics (#13126) 2024-05-10 12:31:49 +00:00
51707446d0 fix: stop holding Pubsub mutex while calling pq.Listener (#12518)
fixes #11950

https://github.com/coder/coder/issues/11950#issuecomment-1987756088 explains the bug

We were also calling into `Unlisten()` and `Close()` while holding the mutex.  I don't believe that `Close()` depends on the notification loop being unblocked, but it's hard to be sure, and the safest thing to do is assume it could block.

So, I added a unit test that fakes out `pq.Listener` and sends a bunch of notifies every time we call into it to hopefully prevent regression where we hold the mutex while calling into these functions.

It also removes the use of a `context.Context` to stop the PubSub -- it must be explicitly `Closed()`.  This simplifies a bunch of the logic, and is how we use the pubsub anyway.
2024-03-12 09:44:12 +04:00
98b86f3cd6 chore: add logs to pq notification dialer (#12020) 2024-02-06 15:21:48 +00:00
5a359d50dd feat: add metrics to PGPubsub (#11971)
Adds prometheus metrics to PGPubsub for monitoring its health and performance in production.

Related to #11950 --- additional diagnostics to help figure out what's happening
2024-02-01 11:25:03 +04:00
a34cada09a feat: add logging to pgPubsub (#11953)
Should be helpful for #11950

Adds a logger to pgPubsub and logs various events, most especially connection and disconnection from postgres.
2024-01-31 15:49:16 +04:00
387723a596 fix: close pg PubSub listener to avoid race (#11640)
Fixes flake as seen here: https://github.com/coder/coder/runs/20528529187
2024-01-18 09:18:59 +04:00
e4b6f5695b chore: separate pubsub into a new package (#8017)
* chore: rename store to dbmock for consistency

* chore: remove redundant dbtype package

This wasn't necessary and forked how we do DB types.

* chore: separate pubsub into a new package

This didn't need to be in database and was bloating it.
2023-06-14 15:34:54 +00:00