Addresses #14734.
This PR wires up `tunnel.go` to a `tailnet.Conn` via the new `/tailnet` endpoint, with all the necessary controllers such that a VPN connection can be started, stopped and inspected via the CoderVPN protocol.
re: #14730
Adds support for the workspace updates protocol controller to also program DNS names for each agent.
Right now, we only program names like `myagent.myworkspace.me.coder` and `myworkspace.coder.` (if there is exactly one agent in the workspace). We also want to support `myagent.myworkspace.username.coder.`, but for that we need to update WorkspaceUpdates RPC to also send the workspace owner's username, which will be in a separate PR.
re: #14715
This PR introduces the Coder service prefix: `fd60:627a:a42b::/48` and refactors our existing code as calling the Tailscale service prefix explicitly (rather than implicitly).
Removes the unused `Addresses` agent option. All clients today assume they can compute the Agent's IP address based on its UUID, so an agent started with a custom address would break things.
First PR to address #14244.
Adds common potential reasons as to why a direct connection to the workspace agent couldn't be established to `coder ping`:
- If the Coder deployment administrator has blocked direction connections (`CODER_BLOCK_DIRECT`).
- If the client has no STUN servers within it's DERP map.
- If the client or agent appears to be behind a hard NAT, as per Tailscale `netInfo.MappingVariesByDestIP`
Also adds a warning if the client or agent has a network interface below the 'safe' MTU for tailnet. This warning is always displayed at the end of a `coder ping`.
I initially made this change when hacking wgengine to also capture wireguard packets going into the magicsock, so that we could capture the initial wireguard handshake.
I don't think we should ship that additional capture logic, but... it seems generally useful to capture packets from the get go on speedtest, so that you can see disco and pings before the TCP speedtest session starts.
When an agent receives a node, it responds with an ACK which is relayed
to the client. After the client receives the ACK, it's allowed to begin
pinging.
Beginnings of a solution to #12297
Doesn't cover disco or definitively display whether we successfully connected to DERP, but shows some checklist diagnostics for connecting to an agent.
For this first PR, I just added it to `coder ping` to see how we like it, but could be incorporated into `coder ssh` _et al._ after a timeout.
```
$ coder ping dogfood2
p2p connection established in 147ms
pong from dogfood2 p2p via 95.217.xxx.yyy:42631 in 147ms
pong from dogfood2 p2p via 95.217.xxx.yyy:42631 in 140ms
pong from dogfood2 p2p via 95.217.xxx.yyy:42631 in 140ms
✔ preferred DERP region 999 (Council Bluffs, Iowa)
✔ sent local data to Coder networking coodinator
✔ received remote agent data from Coder networking coordinator
preferred DERP 10013 (Europe Fly.io (Paris))
endpoints: 95.217.xxx.yyy:42631, 95.217.xxx.yyy:37576, 172.17.0.1:37576, 172.20.0.10:37576
✔ Wireguard handshake 11s ago
```
I noticed a possible race where tailnet.Conn can try to dial the embedded region before we've set our custom dialer that send the DERP in-memory. This closes that race and adds a test case for servertailnet with no STUN and an embedded relay
Fixes 2 related issues:
1. wsconncache had incorrect logic to test whether to send DERPMap updates, sending if the maps were equivalent, instead of if they were _not equivalent_.
2. configmaps used a bugged check to test equality between DERPMaps, since it contains a map and the map entries are serialized in random order. Instead, we avoid comparing the protobufs and instead depend on the existing function that compares `tailcfg.DERPMap`. This also has the effect of reducing the number of times we convert to and from protobuf.
Use TSMP ping for reachability, but leave Disco ping for when we call Ping() since we often use that to determine whether we have a direct connection.
Also adds unit tests to make sure Ping() returns direct connection vs DERP correctly.
Adds support to Coordination to call SetAllPeersLost() when it is closed. This ensure that when we disconnect from a Coordinator, we set all peers lost.
This covers CoderSDK (CLI client) and Agent. Next PR will cover MultiAgent (notably, `wsproxy`).
This one is huge, and I'm sorry.
The problem is that once I change `tailnet.Conn` to start doing v2 behavior, I kind of have to change it everywhere, including in CoderSDK (CLI), the agent, wsproxy, and ServerTailnet.
There is still a bit more cleanup to do, and I need to add code so that when we lose connection to the Coordinator, we mark all peers as LOST, but that will be in a separate PR since this is big enough!
We're seeing some flaky tests related to agent connectivity - https://github.com/coder/coder/actions/runs/7286675441/job/19856270998
I'm pretty sure what happened in this one is that the client opened a connection while the wgengine was in the process of reconfiguring the wireguard device, so the fact that the peer became "active" as a result of traffic being sent was not noticed.
The test calls `AwaitReachable()` but this only tests the disco layer, so it doesn't wait for wireguard to come up.
I think we should be using TSMP for pinging and reachability, since this operates at the IP layer, and therefore requires that wireguard comes up before being successful.
This should also help with the problems we have seen where a TCP connection starts before wireguard is up and the initial round trip has to wait for the 5 second wireguard handshake retry.
fixes: #11294
Fixes an issue where remote forwards are not correctly torn down when using OpenSSH with `coder ssh --stdio`. OpenSSH sends a disconnect signal, but then also sends SIGHUP to `coder`. Previously, we just exited when we got SIGHUP, and this raced against properly disconnecting.
Fixes https://github.com/coder/customers/issues/327
I've said it before, I'll say it again: you can't create a timed context before calling `t.Parallel()` and then use it after.
Fixes flakes like https://github.com/coder/coder/actions/runs/6716682414/job/18253279157
I've chosen just to drop `t.Parallel()` entirely rather than create a second context after the parallel call, since the vast majority of the test time happens before where the parallel call was. It does all the tailnet setup before `t.Parallel()`.
Leaving a call to `t.Parallel()` is a bug risk for future maintainers to come in and use the wrong context in the latter part of the test by accident.
* chore: add /v2 to import module path
go mod requires semantic versioning with versions greater than 1.x
This was a mechanical update by running:
```
go install github.com/marwan-at-work/mod/cmd/mod@latest
mod upgrade
```
Migrate generated files to import /v2
* Fix gen