35 Commits

Author SHA1 Message Date
09cc906981 chore: remove unnecessary redeclarations in for loops (part 2) (#18593) 2025-06-26 12:28:00 -06:00
623dcd97dc fix(cli): fix flakes related to context cancellation when establishing pg connections (#18246)
Since https://github.com/coder/coder/pull/18195 was merged, we started
running CLI tests with postgres instead of just dbmem. This surfaced
errors related to context cancellation while establishing postgres
connections.

This PR should fix https://github.com/coder/internal/issues/672. Related
to https://github.com/coder/coder/issues/15109.
2025-06-05 15:54:13 +02:00
6c0bed0f53 chore: update to coder/quartz v0.2.0 (#18007)
Upgrade to coder/quartz v0.2.0 including fixing up a minor API breaking change.
2025-05-27 16:05:03 +04:00
f670bc31f5 chore: update testutil chan helpers (#17408) 2025-04-16 10:37:09 -06:00
5f3a53f01b fix: fix race condition in pubsub startup (#17088)
fixes https://github.com/coder/internal/issues/525

If the context is canceled, the goroutine that is supposed to read from the `errCh` could exit prematurely, leading to a goroutine leak. Refactors this code so it cannot block.
2025-03-25 16:45:45 +04:00
98dfc70f31 fix(coderd/database): remove linux build tags from db package (#16633)
Remove linux build tags from database package to make sure we can run
tests on Mac OS.
2025-02-25 11:39:37 -05:00
b77b5432c6 test(coderd/database/pubsub): ensure db closure on unhappy paths (#16327) 2025-01-29 14:47:38 +00:00
1336925c9f feat(flake.nix): switch dogfood dev image to buildNixShellImage from dockerTools (#16223)
Replace Depot build action with Nix for Nix dogfood image builds

The dogfood Nix image is now built using Nix's native container tooling instead of Depot. This change:

- Adds Nix setup steps to the GitHub Actions workflow
- Removes the Dockerfile.nix in favor of a Nix-native container build
- Updates the flake.nix to support building Docker images
- Introduces a hash file to track Nix-related changes
- Updates the vendorHash for Go dependencies

Change-Id: I4e011fe3a19d9a1375fbfd5223c910e59d66a5d9
Signed-off-by: Thomas Kosiewski <tk@coder.com>
2025-01-28 16:38:37 +01:00
5861e516b9 chore: add standard test logger ignoring db canceled (#15556)
Refactors our use of `slogtest` to instantiate a "standard logger" across most of our tests.  This standard logger incorporates https://github.com/coder/slog/pull/217 to also ignore database query canceled errors by default, which are a source of low-severity flakes.

Any test that has set non-default `slogtest.Options` is left alone. In particular, `coderdtest` defaults to ignoring all errors. We might consider revisiting that decision now that we have better tools to target the really common flaky Error logs on shutdown.
2024-11-18 14:09:22 +04:00
1bfa7d42e8 chore: add postgres template caching for tests (#15336)
This PR is the first in a series aimed at closing
[#15109](https://github.com/coder/coder/issues/15109).

### Changes

- **Template Database Creation:**  
`dbtestutil.Open` now has the ability to create a template database if
none is provided via `DB_FROM`. The template database’s name is derived
from a hash of the migration files, ensuring that it can be reused
across tests and is automatically updated whenever migrations change.

- **Optimized Database Handling:**  
Previously, `dbtestutil.Open` would spin up a new container for each
test when `DB_FROM` was unset. Now, it first checks for an active
PostgreSQL instance on `localhost:5432`. If none is found, it creates a
single container that remains available for subsequent tests,
eliminating repeated container startups.

These changes address the long individual test times (10+ seconds)
reported by some users, likely due to the time Docker took to start and
complete migrations.
2024-11-04 17:23:31 +01:00
005ea536a5 fix: fix Listen/Unlisten race on Pubsub (#15315)
Fixes #15312

When we need to `Unlisten()` for an event, instead of immediately removing the event from the `p.queues`, we store a channel to signal any goroutines trying to Subscribe to the same event when we are done. On `Subscribe`, if the channel is present, wait for it before calling `Listen` to ensure the ordering is correct.
2024-11-01 14:35:26 +04:00
bd9151d224 fix(coderd/database/pubsub): prevent listeners read outside mutex lock (#15303)
https://github.com/coder/coder/actions/runs/11611105362/job/32331771969#logs

```
2024-10-31T11:36:45.9225038Z WARNING: DATA RACE
2024-10-31T11:36:45.9225120Z Write at 0x00c0000d8030 by goroutine 26:
2024-10-31T11:36:45.9225200Z   runtime.mapdelete()
2024-10-31T11:36:45.9225412Z       /opt/hostedtoolcache/go/1.22.8/x64/src/runtime/map.go:696 +0x0
2024-10-31T11:36:45.9225647Z   github.com/coder/coder/v2/coderd/database/pubsub.(*PGPubsub).subscribeQueue.func2()
2024-10-31T11:36:45.9225906Z       /home/runner/work/coder/coder/coderd/database/pubsub/pubsub.go:277 +0x131
2024-10-31T11:36:45.9225993Z   runtime.deferreturn()
2024-10-31T11:36:45.9226210Z       /opt/hostedtoolcache/go/1.22.8/x64/src/runtime/panic.go:602 +0x5d
2024-10-31T11:36:45.9226283Z   testing.tRunner()
2024-10-31T11:36:45.9226519Z       /opt/hostedtoolcache/go/1.22.8/x64/src/testing/testing.go:1689 +0x21e
2024-10-31T11:36:45.9226603Z   testing.(*T).Run.gowrap1()
2024-10-31T11:36:45.9226831Z       /opt/hostedtoolcache/go/1.22.8/x64/src/testing/testing.go:1742 +0x44
2024-10-31T11:36:45.9226836Z 
2024-10-31T11:36:45.9226934Z Previous read at 0x00c0000d8030 by goroutine 112:
2024-10-31T11:36:45.9227159Z   github.com/coder/coder/v2/coderd/database/pubsub.(*PGPubsub).subscribeQueue.func2()
2024-10-31T11:36:45.9227462Z       /home/runner/work/coder/coder/coderd/database/pubsub/pubsub.go:284 +0x1b6
2024-10-31T11:36:45.9227661Z   github.com/coder/coder/v2/enterprise/replicasync.(*Manager).subscribe.func3()
2024-10-31T11:36:45.9227936Z       /home/runner/work/coder/coder/enterprise/replicasync/replicasync.go:228 +0x53
2024-10-31T11:36:45.9227941Z 
2024-10-31T11:36:45.9228019Z Goroutine 26 (running) created at:
2024-10-31T11:36:45.9228096Z   testing.(*T).Run()
2024-10-31T11:36:45.9228318Z       /opt/hostedtoolcache/go/1.22.8/x64/src/testing/testing.go:1742 +0x825
2024-10-31T11:36:45.9228498Z   github.com/coder/coder/v2/enterprise/replicasync_test.TestReplica()
2024-10-31T11:36:45.9228777Z       /home/runner/work/coder/coder/enterprise/replicasync/replicasync_test.go:33 +0x4b
2024-10-31T11:36:45.9228847Z   testing.tRunner()
2024-10-31T11:36:45.9229063Z       /opt/hostedtoolcache/go/1.22.8/x64/src/testing/testing.go:1689 +0x21e
2024-10-31T11:36:45.9229142Z   testing.(*T).Run.gowrap1()
2024-10-31T11:36:45.9229366Z       /opt/hostedtoolcache/go/1.22.8/x64/src/testing/testing.go:1742 +0x44
2024-10-31T11:36:45.9229369Z 
2024-10-31T11:36:45.9229443Z Goroutine 112 (finished) created at:
2024-10-31T11:36:45.9229685Z   github.com/coder/coder/v2/enterprise/replicasync.(*Manager).subscribe()
2024-10-31T11:36:45.9229952Z       /home/runner/work/coder/coder/enterprise/replicasync/replicasync.go:226 +0x568
2024-10-31T11:36:45.9230092Z   github.com/coder/coder/v2/enterprise/replicasync.New()
2024-10-31T11:36:45.9230361Z       /home/runner/work/coder/coder/enterprise/replicasync/replicasync.go:101 +0x1344
2024-10-31T11:36:45.9230547Z   github.com/coder/coder/v2/enterprise/replicasync_test.TestReplica.func1()
2024-10-31T11:36:45.9230836Z       /home/runner/work/coder/coder/enterprise/replicasync/replicasync_test.go:48 +0x26a
2024-10-31T11:36:45.9230904Z   testing.tRunner()
2024-10-31T11:36:45.9231127Z       /opt/hostedtoolcache/go/1.22.8/x64/src/testing/testing.go:1689 +0x21e
2024-10-31T11:36:45.9231207Z   testing.(*T).Run.gowrap1()
2024-10-31T11:36:45.9231431Z       /opt/hostedtoolcache/go/1.22.8/x64/src/testing/testing.go:1742 +0x44
```
2024-11-01 11:24:29 +04:00
14565615be test(coderd/database/pubsub): fix data race in err assignment (#15306) 2024-10-31 22:37:19 +02:00
b8944074c4 chore: improve coder server ux (#14761) 2024-09-24 13:16:36 +10:00
ded612d3ec fix: use authenticated urls for pubsub (#14261) 2024-08-26 15:04:04 +00:00
e5268e4551 chore: spin clock library out to coder/quartz repo (#13777)
Code that was in `/clock` has been moved to github.com/coder/quartz.  This PR refactors our use of the clock library to point to the external Quartz repo.
2024-07-03 15:02:54 +04:00
8326a3a675 chore: change mock clock to allow Advance() within timer/tick functions (#13500) 2024-06-10 15:27:24 +04:00
fade8ba759 fix: fix MeasureLatencyRecvTimeout to accept send=0 (#13477)
Fixes the flake seen here: https://github.com/coder/coder/runs/25832852690

Linux is not a real time operating system, and so there is no guarantee that subsequent `time.Now()` `time.Since()` calls will return a non-zero time.  This assert is mainly there to ensure we don't return `-1`.
2024-06-05 18:27:56 +04:00
9c3fd5dd26 chore: add explicit Wait() to clock.Advance() (#13464) 2024-06-05 15:37:16 +04:00
42324b386a chore: add clock pkg for testing time (#13461)
Adds a package for testing time/timer/ticker functions.  Implementation is limited to `NewTimer` and `NewContextTicker`, but will eventually be expanded to all `time` functions from the standard library as well as `context.WithTimeout()`, `context.WithDeadline()`.

Replaces `benbjohnson/clock` for the pubsub watchdog, as a proof of concept.

Eventually, as we expand functionality, we will replace most time-related functions with this library for testing.
2024-06-05 13:55:45 +04:00
85de0e966d chore: fix TestMeasureLatency/MeasureLatencyRecvTimeout flake (#13301) 2024-05-16 13:42:42 -05:00
4671ebb330 feat: measure pubsub latencies and expose metrics (#13126) 2024-05-10 12:31:49 +00:00
5454f4997b chore(ci): clean up databases after test finishes in CI (#12702) 2024-03-21 14:53:16 +00:00
51707446d0 fix: stop holding Pubsub mutex while calling pq.Listener (#12518)
fixes #11950

https://github.com/coder/coder/issues/11950#issuecomment-1987756088 explains the bug

We were also calling into `Unlisten()` and `Close()` while holding the mutex.  I don't believe that `Close()` depends on the notification loop being unblocked, but it's hard to be sure, and the safest thing to do is assume it could block.

So, I added a unit test that fakes out `pq.Listener` and sends a bunch of notifies every time we call into it to hopefully prevent regression where we hold the mutex while calling into these functions.

It also removes the use of a `context.Context` to stop the PubSub -- it must be explicitly `Closed()`.  This simplifies a bunch of the logic, and is how we use the pubsub anyway.
2024-03-12 09:44:12 +04:00
213ae69bee fix: start timer before subscribing to avoid test race (#12031)
Fixes #12030

This is a good example of the kind of thing I'd like to address with a time-testing lib.  The problem is that there is a race between the watchdog starting it's timer and the test incrementing the time.  What would make this easier is if the time-testing library could wait for and assert the call to start the timer before incrementing the time.
2024-02-06 20:21:23 +04:00
98b86f3cd6 chore: add logs to pq notification dialer (#12020) 2024-02-06 15:21:48 +00:00
e09cd2c6bd feat: add watchdog to pubsub (#12011)
adds a watchdog to our pubsub and runs it for Coder server.

If the watchdog times out, it triggers a graceful exit in `coder server` to give any provisioner jobs a chance to shut down.

c.f. #11950
2024-02-06 16:58:45 +04:00
c7f52b73bb feat(coderd): add prometheus metrics to servertailnet (#11988) 2024-02-05 23:57:18 -06:00
d5a98cc6d7 fix: avoid race in TestPGPubsub_Metrics by using Eventually (#11973)
Annoyingly, prometheus Registry collects metrics async, which is causing our test to be racy.  They also don't export enough from the Metric interface for us to replicate a synchronous collect, so we have to use Eventually to test.
2024-02-01 12:10:19 +04:00
5a359d50dd feat: add metrics to PGPubsub (#11971)
Adds prometheus metrics to PGPubsub for monitoring its health and performance in production.

Related to #11950 --- additional diagnostics to help figure out what's happening
2024-02-01 11:25:03 +04:00
a34cada09a feat: add logging to pgPubsub (#11953)
Should be helpful for #11950

Adds a logger to pgPubsub and logs various events, most especially connection and disconnection from postgres.
2024-01-31 15:49:16 +04:00
387723a596 fix: close pg PubSub listener to avoid race (#11640)
Fixes flake as seen here: https://github.com/coder/coder/runs/20528529187
2024-01-18 09:18:59 +04:00
22e781eced chore: add /v2 to import module path (#9072)
* chore: add /v2 to import module path

go mod requires semantic versioning with versions greater than 1.x

This was a mechanical update by running:
```
go install github.com/marwan-at-work/mod/cmd/mod@latest
mod upgrade
```

Migrate generated files to import /v2

* Fix gen
2023-08-18 18:55:43 +00:00
b4057bd74a feat: make pgCoordinator generally available (#8419)
* pgCoord to GA, fix tests

Signed-off-by: Spike Curtis <spike@coder.com>

* Fix generation and coordinator delete RBAC

Signed-off-by: Spike Curtis <spike@coder.com>

* Fix fakeQuerier -> FakeQuerier

Signed-off-by: Spike Curtis <spike@coder.com>

---------

Signed-off-by: Spike Curtis <spike@coder.com>
2023-07-12 13:35:29 +04:00
e4b6f5695b chore: separate pubsub into a new package (#8017)
* chore: rename store to dbmock for consistency

* chore: remove redundant dbtype package

This wasn't necessary and forked how we do DB types.

* chore: separate pubsub into a new package

This didn't need to be in database and was bloating it.
2023-06-14 15:34:54 +00:00