Commit Graph

21 Commits

Author SHA1 Message Date
7cd6e9cdd6 fix: return provisioners in desc order and add limit to cli (#16720) 2025-02-26 21:06:51 +02:00
7f061b9faf fix(coderd): add stricter authorization for provisioners endpoint (#16587)
References #16558
2025-02-17 14:34:47 +02:00
77306f3de1 feat(coderd): add filters and fix template for provisioner daemons (#16558)
This change adds provisioner daemon ID filter to the provisioner daemons
endpoint, and also implements the limiting to 50 results.

Test coverage is greatly improved and template information for jobs
associated to the daemon was also fixed.

Updates #15084
Updates #15192
Related #16532
2025-02-14 17:26:46 +02:00
071bb26018 feat(coderd): add endpoint to list provisioner daemons (#16028)
Updates #15190
Updates #15084
Supersedes #15940
2025-01-14 16:40:26 +00:00
b6703b11c6 feat: Add external provisioner daemons (#4935)
* Start to port over provisioner daemons PR

* Move to Enterprise

* Begin adding tests for external registration

* Move provisioner daemons query to enterprise

* Move around provisioner daemons schema

* Add tags to provisioner daemons

* make gen

* Add user local provisioner daemons

* Add provisioner daemons

* Add feature for external daemons

* Add command to start a provisioner daemon

* Add provisioner tags to template push and create

* Rename migration files

* Fix tests

* Fix entitlements test

* PR comments

* Update migration

* Fix FE types
2022-11-16 16:34:06 -06:00
4e57b9fbdc fix: allow regular users to push files (#4500)
- As part of merging support for Template RBAC
  and user groups a permission check on reading files
  was relaxed.

  With the addition of admin roles on individual templates, regular
  users are now able to push template versions if they have
  inherited the 'admin' role for a template. In order to do so
  they need to be able to create and read their own files. Since
  collisions on hash in the past were ignored, this means that a regular user
  who pushes a template version with a file hash that collides with
  an existing hash will not be able to read the file (since it belongs to
  another user).

  This commit fixes the underlying problem which was that
  the files table had a primary key on the 'hash' column.
  This was not a problem at the time because only template
  admins and other users with similar elevated roles were
  able to read all files regardless of ownership. To fix this
  a new column and primary key 'id' has been introduced to the files
  table. The unique constraint has been updated to be hash+created_by.
  Tables (provisioner_jobs) that referenced files.hash have been updated
  to reference files.id. Relevant API endpoints have also been updated.
2022-10-13 18:02:52 -05:00
67c4605370 chore: Reduce test times (#3856)
* chore: Reduce test times

* Rename IncludeProvisionerD to IncludeProvisionerDaemon

* Make  TestTemplateDAUs use Tailnet
2022-09-04 11:28:09 -05:00
ccf6f4e7ed chore: Use contexts with timeout in coderd tests (#3381) 2022-08-09 20:17:00 +03:00
4730c589fe chore: Use standardized test timeouts and delays (#3291) 2022-08-01 15:45:05 +03:00
51dd1fde3b fix: Remove use of require in require.Eventually in tests (#3110)
* fix: Remove use of `require` in `require.Eventually` in tests

Because require uses `t.FailNow()` and `require.Eventually` runs the
function in a goroutine, which is not allowed.

* feat: Add ruleguard for require.Eventually

Co-authored-by: Cian Johnston <cian@coder.com>
2022-07-22 20:02:49 +03:00
c04d045279 feat: RBAC provisionerdaemons and parameters (#1755)
* chore: Remove org_id from provisionerdaemons
2022-05-26 11:20:54 -05:00
1871b09697 feat: in-process provisionerd connection (#1568)
* in-process provisionerd connection

Signed-off-by: Spike Curtis <spike@coder.com>

* disable lint for server.go/newProvisionerDaemon

Signed-off-by: Spike Curtis <spike@coder.com>
2022-05-19 17:47:45 -05:00
6c1117094d chore: Force codersdk to not import anything from database (#1576)
* chore: Force codersdk to not import anything from database (linter rule)
* chore: Move all database types in codersdk out
2022-05-19 13:04:44 -05:00
680de709a5 chore: organize http handlers (#1486)
They're currently randomly in a bunch of different files. This cleans up
the handler functions to be in the file of the type they return.
2022-05-16 14:36:27 -05:00
db7ed4d019 fix: Add resiliency to daemon connections (#1116)
Connections could fail when massive payloads were transmitted.
This fixes an upstream bug in dRPC where the connection would
end with a context canceled if a message was too large.

This adds retransmission of completion and failures too. If
Coder somehow loses connection with a provisioner daemon,
upon the next connection the state will be properly reported.
2022-04-24 20:33:19 -05:00
bf0ae8f573 feat: Refactor API routes to use UUIDs instead of friendly names (#401)
* Add client for agent

* Cleanup code

* Fix linting error

* Rename routes to be simpler

* Rename workspace history to workspace build

* Refactor HTTP middlewares to use UUIDs

* Cleanup routes

* Compiles!

* Fix files and organizations

* Fix querying

* Fix agent lock

* Cleanup database abstraction

* Add parameters

* Fix linting errors

* Fix log race

* Lock on close wait

* Fix log cleanup

* Fix e2e tests

* Fix upstream version of opencensus-go

* Update coderdtest.go

* Fix coverpkg

* Fix codecov ignore
2022-03-07 11:40:54 -06:00
8958b641e9 feat: Add agent authentication based on instance ID (#336)
* feat: Add agent authentication based on instance ID

Each cloud has it's own unique instance identity signatures, which
can be used for zero-token authentication. This change adds support
for tracking by "instance_id", and automatically authenticating
with Google Cloud.

* Add test for CLI

* Fix workspace agent request name

* Fix race with adding to wait group

* Fix name of instance identity token
2022-02-21 20:36:29 +00:00
154b9bce57 feat: Add "coder projects create" command (#246)
* Refactor parameter parsing to return nil values if none computed

* Refactor parameter to allow for hiding redisplay

* Refactor parameters to enable schema matching

* Refactor provisionerd to dynamically update parameter schemas

* Refactor job update for provisionerd

* Handle multiple states correctly when provisioning a project

* Add project import job resource table

* Basic creation flow works!

* Create project fully works!!!

* Only show job status if completed

* Add create workspace support

* Replace Netflix/go-expect with ActiveState

* Fix linting errors

* Use forked chzyer/readline

* Add create workspace CLI

* Add CLI test

* Move jobs to their own APIs

* Remove go-expect

* Fix requested changes

* Skip workspacecreate test on windows
2022-02-12 13:34:04 -06:00
d55231cc0f refactor: Rename ProjectParameter to ProjectVersionParameter (#170)
This was confusing with ParameterValue before. It still is a bit,
but this should help distinguish scope.
2022-02-07 21:40:08 +00:00
1796dc6c2f chore: Add test helpers to improve coverage (#166)
* chore: Rename ProjectHistory to ProjectVersion

Version more accurately represents version storage. This
forks from the WorkspaceHistory name, but I think it's
easier to understand Workspace history.

* Rename files

* Standardize tests a bit more

* Remove Server struct from coderdtest

* Improve test coverage for workspace history

* Fix linting errors

* Fix coderd test leak

* Fix coderd test leak

* Improve workspace history logs

* Standardize test structure for codersdk

* Fix linting errors

* Fix WebSocket compression

* Update coderd/workspaces.go

Co-authored-by: Bryan <bryan@coder.com>

* Add test for listing project parameters

* Cache npm dependencies with setup node

* Remove windows npm cache key

Co-authored-by: Bryan <bryan@coder.com>
2022-02-05 18:24:51 -06:00
e75bde4e31 feat: Add provisionerdaemon to coderd (#141)
* feat: Add history middleware parameters

These will be used for streaming logs, checking status,
and other operations related to workspace and project
history.

* refactor: Move all HTTP routes to top-level struct

Nesting all structs behind their respective structures
is leaky, and promotes naming conflicts between handlers.

Our HTTP routes cannot have conflicts, so neither should
function naming.

* Add provisioner daemon routes

* Add periodic updates

* Skip pubsub if short

* Return jobs with WorkspaceHistory

* Add endpoints for extracting singular history

* The full end-to-end operation works

* fix: Disable compression for websocket dRPC transport (#145)

There is a race condition in the interop between the websocket and `dRPC`: https://github.com/coder/coder/runs/5038545709?check_suite_focus=true#step:7:117 - it seems both the websocket and dRPC feel like they own the `byte[]` being sent between them. This can lead to data races, in which both `dRPC` and the websocket are writing.

This is just tracking some experimentation to fix that race condition

## Run results: ##
- Run 1: peer test failure
- Run 2: peer test failure
- Run 3: `TestWorkspaceHistory/CreateHistory`  - https://github.com/coder/coder/runs/5040858460?check_suite_focus=true#step:8:45
```
status code 412: The provided project history is running. Wait for it to complete importing!`
```
- Run 4: `TestWorkspaceHistory/CreateHistory` - https://github.com/coder/coder/runs/5040957999?check_suite_focus=true#step:7:176
```
    workspacehistory_test.go:122: 
        	Error Trace:	workspacehistory_test.go:122
        	Error:      	Condition never satisfied
        	Test:       	TestWorkspaceHistory/CreateHistory
```
- Run 5: peer failure
- Run 6: Pass  
- Run 7: Peer failure

## Open Questions: ##

### Is `dRPC` or `websocket` at fault for the data race?

It looks like this condition is specifically happening when `dRPC` decides to [`SendError`]). This constructs a new byte payload from [`MarshalError`](f6e369438f/drpcwire/error.go (L15)) - so `dRPC` has created this buffer and owns it.

From `dRPC`'s perspective, the callstack looks like this:
- [`sendPacket`](f6e369438f/drpcstream/stream.go (L253))
  - [`writeFrame`](f6e369438f/drpcwire/writer.go (L65))
    - [`AppendFrame`](f6e369438f/drpcwire/packet.go (L128))
      - with finally the data race happening here:
```go
// AppendFrame appends a marshaled form of the frame to the provided buffer.
func AppendFrame(buf []byte, fr Frame) []byte {
...
	out := buf
	out = append(out, control).   // <---------
```

This should be fine, since `dPRC` create this buffer, and is taking the byte buffer constructed from `MarshalError` and tacking a bunch of headers on it to create a proper frame.

Once `dRPC` is done writing, it _hangs onto the buffer and resets it here__: f6e369438f/drpcwire/writer.go (L73)

However... the websocket implementation, once it gets the buffer, it runs a `statelessDeflate` [here](8dee580a7f/write.go (L180)), which compresses the buffer on the fly. This functionality actually [mutates the buffer in place](a1a9cfc821/flate/stateless.go (L94)), which is where get our race.

In the case where the `byte[]` aren't being manipulated anywhere else, this compress-in-place operation would be safe, and that's probably the case for most over-the-wire usages. In this case, though, where we're plumbing `dRPC` -> websocket, they both are manipulating it (`dRPC` is reusing the buffer for the next `write`, and `websocket` is compressing on the fly).

### Why does cloning on `Read` fail?

Get a bunch of errors like:
```
2022/02/02 19:26:10 [WARN] yamux: frame for missing stream: Vsn:0 Type:0 Flags:0 StreamID:0 Length:0
2022/02/02 19:26:25 [ERR] yamux: Failed to read header: unexpected EOF
2022/02/02 19:26:25 [ERR] yamux: Failed to read header: unexpected EOF
2022/02/02 19:26:25 [WARN] yamux: frame for missing stream: Vsn:0 Type:0 Flags:0 StreamID:0 Length:0
```

# UPDATE:

We decided we could disable websocket compression, which would avoid the race because the in-place `deflate` operaton would no longer be run. Trying that out now:

- Run 1:  
- Run 2: https://github.com/coder/coder/runs/5042645522?check_suite_focus=true#step:8:338
- Run 3:  
- Run 4: https://github.com/coder/coder/runs/5042988758?check_suite_focus=true#step:7:168
- Run 5: 

* fix: Remove race condition with acquiredJobDone channel (#148)

Found another data race while running the tests: https://github.com/coder/coder/runs/5044320845?check_suite_focus=true#step:7:83

__Issue:__ There is a race in the p.acquiredJobDone chan - in particular, there can be a case where we're waiting on the channel to finish (in close) with <-p.acquiredJobDone, but in parallel, an acquireJob could've been started, which would create a new channel for p.acquiredJobDone. There is a similar race in `close(..)`ing the channel, which also came up in test runs.

__Fix:__ Instead of recreating the channel everytime, we can use `sync.WaitGroup` to accomplish the same functionality - a semaphore to make close wait for the current job to wrap up.

* fix: Bump up workspace history timeout (#149)

This is an attempted fix for failures like: https://github.com/coder/coder/runs/5043435263?check_suite_focus=true#step:7:32

Looking at the timing of the test:
```
    t.go:56: 2022-02-02 21:33:21.964 [DEBUG]	(terraform-provisioner)	<provision.go:139>	ran apply
    t.go:56: 2022-02-02 21:33:21.991 [DEBUG]	(provisionerd)	<provisionerd.go:162>	skipping acquire; job is already running
    t.go:56: 2022-02-02 21:33:22.050 [DEBUG]	(provisionerd)	<provisionerd.go:162>	skipping acquire; job is already running
    t.go:56: 2022-02-02 21:33:22.090 [DEBUG]	(provisionerd)	<provisionerd.go:162>	skipping acquire; job is already running
    t.go:56: 2022-02-02 21:33:22.140 [DEBUG]	(provisionerd)	<provisionerd.go:162>	skipping acquire; job is already running
    t.go:56: 2022-02-02 21:33:22.195 [DEBUG]	(provisionerd)	<provisionerd.go:162>	skipping acquire; job is already running
    t.go:56: 2022-02-02 21:33:22.240 [DEBUG]	(provisionerd)	<provisionerd.go:162>	skipping acquire; job is already running
    workspacehistory_test.go:122: 
        	Error Trace:	workspacehistory_test.go:122
        	Error:      	Condition never satisfied
        	Test:       	TestWorkspaceHistory/CreateHistory
```

It  appears that the `terraform apply` job had just finished - with less than a second to spare until our `require.Eventually` completes - but there's still work to be done (ie, collecting the state files). So my suspicion is that terraform might, in some cases, exceed our 5s timeout.

Note that in the setup for this test - there is a similar project history wait that waits for 15s, so I borrowed that here.

In the future - we can look at potentially using a simple echo provider to exercise this in the unit test, in a way that is more reliable in terms of timing. I'll log an issue to track that.

Co-authored-by: Bryan <bryan@coder.com>
2022-02-03 20:34:50 +00:00