coder

mirror of https://github.com/coder/coder.git synced 2025-07-09 11:45:56 +00:00

Author	SHA1	Message	Date
Mathias Fredriksson	e33a74975e	fix: Deadlock and race in `peer`, test improvements (#3086 ) * fix: Potential deadlock in peer.Channel dc.OnOpen * fix: Potential send on closed channel * fix: Improve robustness of waitOpened during close * chore: Simplify statements * fix: Improve teardown and timeout of peer tests * fix: Improve robustness of TestConn/Buffering test * Update peer/channel.go Co-authored-by: Steven Masley <Emyrk@users.noreply.github.com>	2022-07-21 18:47:17 +03:00
Kyle Carberry	013f028e55	feat: Add workspace application support (#1773 ) * feat: Add app support This adds apps as a property to a workspace agent. The resource is added to the Terraform provider here: https://github.com/coder/terraform-provider-coder/pull/17 Apps will be opened in the dashboard or via the CLI with `coder open <name>`. If `command` is specified, a terminal will appear locally and in the web. If `target` is specified, the browser will open to an exposed instance of that target. * Compare fields in apps test * Update Terraform provider to use relative path * Add some basic structure for routing * chore: Remove interface from coderd and lift API surface Abstracting coderd into an interface added misdirection because the interface was never intended to be fulfilled outside of a single implementation. This lifts the abstraction, and attaches all handlers to a root struct named `coderd.API`. Add basic proxy logic * Add proxying based on path * Add app proxying for wildcards * Add wsconncache * fix: Race when writing to a closed pipe This is such an intermittent race it's difficult to track, but regardless this is an improvement to the code. * fix: Race when writing to a closed pipe This is such an intermittent race it's difficult to track, but regardless this is an improvement to the code. * fix: Race when writing to a closed pipe This is such an intermittent race it's difficult to track, but regardless this is an improvement to the code. * fix: Race when writing to a closed pipe This is such an intermittent race it's difficult to track, but regardless this is an improvement to the code. * Add workspace route proxying endpoint - Makes the workspace conn cache concurrency-safe - Reduces unnecessary open checks in `peer.Channel` - Fixes the use of a temporary context when dialing a workspace agent * Add embed errors * chore: Refactor site to improve testing It was difficult to develop this package due to the embed build tag being mandatory on the tests. The logic to test doesn't require any embedded files. * Add test for error handler * Remove unused access url * Add RBAC tests * Fix dial agent syntax * Fix linting errors * Fix gen * Fix icon required * Adjust migration number * Fix proxy error status code * Fix empty db lookup	2022-06-04 15:13:37 -05:00
Kyle Carberry	984dc2bffd	fix: Close peer negotiate mutex if we haven't negotiated (#1774 ) Closes #1706 and #1644.	2022-05-27 17:34:13 +00:00
Dean Sheather	9141be3656	feat: add port-forward subcommand (#1350 )	2022-05-19 00:10:40 +10:00
Colin Adler	eda85a0141	fix: force logs to flush on close in peer.(*Conn) (#1268 )	2022-05-03 08:36:48 -05:00
Kyle Carberry	3176e10562	fix: Use atomic value for logger in peer (#1257 ) This caused many races where logs would escape the tests by milliseconds. By using an atomic on the logger, we can fix all of it!	2022-05-02 11:49:59 -05:00
Colin Adler	82364d174f	fix: ensure rtc state changes can't log after close (#1213 )	2022-04-28 16:38:59 -05:00
Kyle Carberry	d202f20fdb	feat: Add TURN proxying to enable offline deployments (#1000 ) * Add turnconn * Add option for passing ICE servers * Log TURN remote address * Add TURN server to coder start	2022-04-18 22:40:25 +00:00
Kyle Carberry	26d24f4508	chore: Improve CI builds by caching Go modules (#528 ) * chore: Improve CI builds by caching Go modules * Skip running with `race` on non-Linux systems * Fix darwin file descriptor error * Fix log after close * Improve PostgreSQL test speeds * Fix parallel connections with PostgreSQL tests * Fix CI flake * Separate test/go into PostgreSQL	2022-03-22 17:09:04 -05:00
Kyle Carberry	bf0ae8f573	feat: Refactor API routes to use UUIDs instead of friendly names (#401 ) * Add client for agent * Cleanup code * Fix linting error * Rename routes to be simpler * Rename workspace history to workspace build * Refactor HTTP middlewares to use UUIDs * Cleanup routes * Compiles! * Fix files and organizations * Fix querying * Fix agent lock * Cleanup database abstraction * Add parameters * Fix linting errors * Fix log race * Lock on close wait * Fix log cleanup * Fix e2e tests * Fix upstream version of opencensus-go * Update coderdtest.go * Fix coverpkg * Fix codecov ignore	2022-03-07 11:40:54 -06:00
Kyle Carberry	65de96c8b4	fix: Leaking yamux session after HTTP handler is closed (#329 ) * fix: Leaking yamux session after HTTP handler is closed Closes #317. The httptest server cancels the context after the connection is closed, but if a connection takes a long time to close, the request would never end. This applies a context to the entire listener that cancels on test cleanup. After discussion with @bryphe-coder, reducing the parallel limit on Windows is likely to reduce failures as well. * Switch to windows-2022 to improve decompression * Invalidate cache on matrix OS	2022-02-18 22:06:56 -06:00
Kyle Carberry	fb020a5d1b	fix: Update pion/webrtc to fix ICE negotiation race (#153 ) * Add trace logging for pion (dtls,ice,pc) * Temporarily disable postgres tests to spend more cycles on mock tests * experiment: Add trace logging for WebRTC offer and answer * Use forked pion/webrtc Co-authored-by: Bryan Phelps <bryan@coder.com>	2022-02-03 22:10:21 +00:00
Kyle Carberry	e75bde4e31	feat: Add provisionerdaemon to coderd (#141 ) * feat: Add history middleware parameters These will be used for streaming logs, checking status, and other operations related to workspace and project history. * refactor: Move all HTTP routes to top-level struct Nesting all structs behind their respective structures is leaky, and promotes naming conflicts between handlers. Our HTTP routes cannot have conflicts, so neither should function naming. * Add provisioner daemon routes * Add periodic updates * Skip pubsub if short * Return jobs with WorkspaceHistory * Add endpoints for extracting singular history * The full end-to-end operation works * fix: Disable compression for websocket dRPC transport (#145) There is a race condition in the interop between the websocket and `dRPC`: https://github.com/coder/coder/runs/5038545709?check_suite_focus=true#step:7:117 - it seems both the websocket and dRPC feel like they own the `byte[]` being sent between them. This can lead to data races, in which both `dRPC` and the websocket are writing. This is just tracking some experimentation to fix that race condition ## Run results: ## - Run 1: peer test failure - Run 2: peer test failure - Run 3: `TestWorkspaceHistory/CreateHistory` - https://github.com/coder/coder/runs/5040858460?check_suite_focus=true#step:8:45 ``` status code 412: The provided project history is running. Wait for it to complete importing!` ``` - Run 4: `TestWorkspaceHistory/CreateHistory` - https://github.com/coder/coder/runs/5040957999?check_suite_focus=true#step:7:176 ``` workspacehistory_test.go:122: Error Trace: workspacehistory_test.go:122 Error: Condition never satisfied Test: TestWorkspaceHistory/CreateHistory ``` - Run 5: peer failure - Run 6: Pass ✅ - Run 7: Peer failure ## Open Questions: ## ### Is `dRPC` or `websocket` at fault for the data race? It looks like this condition is specifically happening when `dRPC` decides to [`SendError`]). This constructs a new byte payload from [`MarshalError`](`f6e369438f/drpcwire/error.go (L15)`) - so `dRPC` has created this buffer and owns it. From `dRPC`'s perspective, the callstack looks like this: - [`sendPacket`](`f6e369438f/drpcstream/stream.go (L253)`) - [`writeFrame`](`f6e369438f/drpcwire/writer.go (L65)`) - [`AppendFrame`](`f6e369438f/drpcwire/packet.go (L128)`) - with finally the data race happening here: ```go // AppendFrame appends a marshaled form of the frame to the provided buffer. func AppendFrame(buf []byte, fr Frame) []byte { ... out := buf out = append(out, control). // <--------- ``` This should be fine, since `dPRC` create this buffer, and is taking the byte buffer constructed from `MarshalError` and tacking a bunch of headers on it to create a proper frame. Once `dRPC` is done writing, it _hangs onto the buffer and resets it here__: `f6e369438f/drpcwire/writer.go (L73)` However... the websocket implementation, once it gets the buffer, it runs a `statelessDeflate` [here](`8dee580a7f/write.go (L180)`), which compresses the buffer on the fly. This functionality actually [mutates the buffer in place](`a1a9cfc821/flate/stateless.go (L94)`), which is where get our race. In the case where the `byte[]` aren't being manipulated anywhere else, this compress-in-place operation would be safe, and that's probably the case for most over-the-wire usages. In this case, though, where we're plumbing `dRPC` -> websocket, they both are manipulating it (`dRPC` is reusing the buffer for the next `write`, and `websocket` is compressing on the fly). ### Why does cloning on `Read` fail? Get a bunch of errors like: ``` 2022/02/02 19:26:10 [WARN] yamux: frame for missing stream: Vsn:0 Type:0 Flags:0 StreamID:0 Length:0 2022/02/02 19:26:25 [ERR] yamux: Failed to read header: unexpected EOF 2022/02/02 19:26:25 [ERR] yamux: Failed to read header: unexpected EOF 2022/02/02 19:26:25 [WARN] yamux: frame for missing stream: Vsn:0 Type:0 Flags:0 StreamID:0 Length:0 ``` # UPDATE: We decided we could disable websocket compression, which would avoid the race because the in-place `deflate` operaton would no longer be run. Trying that out now: - Run 1: ✅ - Run 2: https://github.com/coder/coder/runs/5042645522?check_suite_focus=true#step:8:338 - Run 3: ✅ - Run 4: https://github.com/coder/coder/runs/5042988758?check_suite_focus=true#step:7:168 - Run 5: ✅ * fix: Remove race condition with acquiredJobDone channel (#148) Found another data race while running the tests: https://github.com/coder/coder/runs/5044320845?check_suite_focus=true#step:7:83 __Issue:__ There is a race in the p.acquiredJobDone chan - in particular, there can be a case where we're waiting on the channel to finish (in close) with <-p.acquiredJobDone, but in parallel, an acquireJob could've been started, which would create a new channel for p.acquiredJobDone. There is a similar race in `close(..)`ing the channel, which also came up in test runs. __Fix:__ Instead of recreating the channel everytime, we can use `sync.WaitGroup` to accomplish the same functionality - a semaphore to make close wait for the current job to wrap up. * fix: Bump up workspace history timeout (#149) This is an attempted fix for failures like: https://github.com/coder/coder/runs/5043435263?check_suite_focus=true#step:7:32 Looking at the timing of the test: ``` t.go:56: 2022-02-02 21:33:21.964 [DEBUG] (terraform-provisioner) <provision.go:139> ran apply t.go:56: 2022-02-02 21:33:21.991 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running t.go:56: 2022-02-02 21:33:22.050 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running t.go:56: 2022-02-02 21:33:22.090 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running t.go:56: 2022-02-02 21:33:22.140 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running t.go:56: 2022-02-02 21:33:22.195 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running t.go:56: 2022-02-02 21:33:22.240 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running workspacehistory_test.go:122: Error Trace: workspacehistory_test.go:122 Error: Condition never satisfied Test: TestWorkspaceHistory/CreateHistory ``` It appears that the `terraform apply` job had just finished - with less than a second to spare until our `require.Eventually` completes - but there's still work to be done (ie, collecting the state files). So my suspicion is that terraform might, in some cases, exceed our 5s timeout. Note that in the setup for this test - there is a similar project history wait that waits for 15s, so I borrowed that here. In the future - we can look at potentially using a simple echo provider to exercise this in the unit test, in a way that is more reliable in terms of timing. I'll log an issue to track that. Co-authored-by: Bryan <bryan@coder.com>	2022-02-03 20:34:50 +00:00
Kyle Carberry	35dd208fba	fix: Incorrect spelling of "offerer" in peer (#154 ) 🤦🤦🤦🤦🤦	2022-02-03 13:11:03 -06:00
Kyle Carberry	9db5fb0952	refactor: Improve handshake resiliency of peer (#95 ) * fix: Synchronize peer logging with a channel We were depending on the close mutex to properly report connection state. This ensures the RTC connection is properly closed before returning. * Disable pion logging * Remove buffer * Try ICE servers * Remove flushed * Add diagram explaining handshake * Fix candidate accept ordering * Add debug logging to peerbroker * Fix send ordering * Lock adding ICE candidate * Add test for negotiating out of order * Reduce connection to a single negotiation channel * Improve test times by pre-installing Terraform * Lock remote session description being applied * Organize conn * Revert to multi-channel setup * Properly close ICE gatherer * Improve comments * Try removing buffered candidates * Buffer local and remote messages * Log dTLS transport state * Add pion logging	2022-01-30 20:11:18 -06:00
Kyle Carberry	5367d93b87	ci: Improve peer logging to help identify race (#93 ) * ci: Improve peer logging to help identify race * Remove mutex locks * Add hash to write	2022-01-30 01:33:19 +00:00
Kyle Carberry	3e88f1502a	refactor: Remove "Opts" abbreviation (#92 ) Having a mixture of abbreviations in the codebase reduces clarity. Although opts is common for options, I'd rather set a precedent of clarifying verbosity.	2022-01-29 19:27:44 -06:00
Kyle Carberry	27f7299383	chore: Update pion/ice fork to resolve goroutine leak (#78 ) * chore: Update pion/ice fork to resolve goroutine leak * Flush remote too * Add logs for setting the description * Try locking only on remote * Remove local bufferring in favor of remote * Remove unused flush func * Set candidates flushed to true * Defer flush until the end of negotiation * Buffer ICE candidates * Add comment clarifying channel buffer * Flush after handshake * Move away from fork * Ignore pion/ice leaks	2022-01-27 20:43:55 -06:00
Kyle Carberry	30dae97c3e	chore: Buffer remote candidates like local (#77 ) * chore: Buffer remote candidates like local This was added for local candidates, and is required for remote to prevent a race where they are added before a negotiation is complete. I removed the mutex earlier, because it would cause a different race. I didn't realize the remote candidates wouldn't be buffered, but with this change they are! * Use local description instead * Add logging for candidate flush * Fix race with atomic bool * Simplify locks * Add mutex to flush * Reset buffer * Remove leak dependency to limit confusion * Fix ordering * Revert channel close * Flush candidates after remote session description is set * Bump up count to ensure race is fixed * Use custom ICE dependency * Fix data race * Lower timeout to make for fast CI * Add back mutex to prevent race * Improve debug logging * Lock on local description * Flush local candidates uniquely * Fix race * Move mutex to prevent candidate send race * Move lock to handshake so no race can occur * Reduce timeout to improve test times * Move unlock to defer * Use flushed bool instead of checking remote	2022-01-27 09:14:52 -06:00
Kyle Carberry	9329a50ad6	chore: Fix race in collecting ICE Candidates (#76 ) * chore: Fix race in collecting ICE Candidates This logic was flawed previously. ICE Candidates could collect before a negotiation was triggered, which led to a race where candidates would be lost. Candidates can no longer be lost, and we removed some code 😎. * Add comment describing fix * Use upstream dependency to fix goroutine leak * Use upstream dependency to fix goroutine leak	2022-01-26 20:14:37 -06:00
Kyle Carberry	50d8151995	ci: Run tests using PostgreSQL database and mock (#49 ) * ci: Run tests using PostgreSQL database and mock This allows us to use the mock database for quick iterative testing, and have confidence from CI using a real PostgreSQL database. PostgreSQL tests are only ran on Linux. They are really slow on MacOS and Windows runners, and don't provide much additional confidence. * Only run PostgreSQL tests once for speed * Fix race condition of log after close Not all resources were cleaned up immediately after a peer connection was closed. DataChannels could have a goroutine exit after Close() prior to this. * Fix comment	2022-01-22 21:58:26 +00:00
Kyle Carberry	2654a93132	chore: Fix golangci-lint configuration and patch errors (#34 ) * chore: Fix golangci-lint configuration and patch errors Due to misconfiguration of a linting rules directory, our linter has not been working properly. This change fixes the configuration issue, and all remaining linting errors. * Fix race in peer logging * Fix race and return * Lock on bufferred amount low * Fix mutex lock	2022-01-20 10:00:13 -06:00
Kyle Carberry	4308f169d6	fix: Lock when obtaining a peer connection answer<->offer (#24 ) * fix: Lock when obtaining a peer connection answer<->offer This fixes a race in the peerbroker package where ICE candidates could be added before the connection was negotiated. This would result in the connection failing. * Remove unnecessary log	2022-01-13 15:02:22 -06:00
Kyle Carberry	53cfa8a45a	feat: Create broker for negotiating connections (#14 ) * feat: Create broker for negotiating connections WebRTC require an exchange of encryption keys and network hops to connect. This package pipes the exchange over gRPC. This will be used in all connecting clients and agents. * Regenerate protobuf definition * Cache Go build and test * Fix gRPC language with dRPC Co-authored-by: Bryan <bryan@coder.com> Co-authored-by: Bryan <bryan@coder.com>	2022-01-11 09:28:41 -06:00
Kyle Carberry	8accb815e1	chore: Add peer package for networking (#6 ) This package was pulled straight from github.com/coder/m. Nothing has been changed. It will be used for networking clients<->workspaces, and coderd<->provisionerd.	2022-01-05 11:18:29 -06:00

25 Commits