This commit reverts some of the changes in #8029 and implements an
alternative method of keeping track of when the startup script has ended
and there will be no more logs.
This is achieved by adding new agent fields for tracking when the agent
enters the "starting" and "ready"/"start_error" lifecycle states. The
timestamps simplify logic since we don't need understand if the current
state is before or after the state we're interested in. They can also be
used to show data like how long the startup script took to execute. This
also allowed us to remove the EOF field from the logs as the
implementation was problematic when we returned the EOF log entry in the
response since requesting _after_ that ID would give no logs and the API
would thus lose track of EOF.
* feat(coderd,agent): send startup log eof at the end
* fix(coderd): fix edge case in startup log pubsub
* fix(coderd): ensure startup logs are closed on lifecycle state change (fallback)
* fix(codersdk): fix startup log channel shared memory bug
* fix(site): remove the EOF log line
* fix(codersdk): wait for subscription in WatchWorkspaceAgentMetadata
* fix(coderd): subscribe before sending initial metadata event
* test(coderd): add retries to TestWorkspaceAgent_Metadata to avoid flake
Previously, the `coder git ssh` command would hang on the API, which was endlessly polling the database for oauth tokens that expire in the future.
Some oAuth implementations (including GitHub by default) will not give back a token expiry date, and the absence of such a date was represented as a zero data in the database as opposed to a null value.
Follow-up calls to `git clone` would succeed because this hot path doesn't check expiry, perhaps originally by mistake.
In addition to fixing the zero date issue, this PR removes all gitauth PubSub, which added too much complexity when the polling interval is 1 second.
* chore: Rbac errors should be returned, and not hidden behind 404
SqlErrNoRows was hiding actual errors
* Replace sql.ErrNoRow checks
* Remove sql err no rows check from dbauthz test
* Fix to use dbauthz system user
* chore: move app proxying code to workspaceapps pkg
Moves path-app, subdomain-app and reconnecting PTY proxying to the new
workspaceapps.WorkspaceAppServer struct. This is in preparation for
external workspace proxies.
Updates app logout flow to avoid redirecting to coder-logout.${app_host}
on logout. Instead, all subdomain app tokens owned by the logging-out
user will be deleted every time you logout for simplicity sake.
Tests will remain in their original package, pending being moved to an
apptest package (or similar).
Co-authored-by: Steven Masley <stevenmasley@coder.com>
* feat: dbauthz always on, out of experimental
* Add ability to do rbac checks in unit tests
* Remove AuthorizeAllEndpoints
* Remove duplicate rbac checks
Prior to this change, DERP traffic would route from `coderd` to the
`CODER_ACCESS_URL` to reach the internal DERP server, which may have
resulted in slower connections due to proxying, or the failure of
web traffic entirely.
If your Coder deployment has a proxy in front of it, your traffic through
web terminals, apps, and port-forwarding is about to get a lot faster!
* fix: don't make session counts cumulative
This made for some weird tracking... we want the point-in-time
number of counts!
* Add databasefake query for getting agent stats
* Add deployment stats endpoint
* The query... works?!?
* Fix aggregation query
* Select from multiple tables instead
* Fix continuous stats
* Increase period of stat refreshes
* Add workspace counts to deployment stats
* fmt
* Add a slight bit of responsiveness
* Fix template version editor overflow
* Add refresh button
* Fix font family on button
* Fix latest stat being reported
* Revert agent conn stats
* Fix linting error
* Fix tests
* Fix gen
* Fix migrations
* Block on sending stat updates
* Add test fixtures
* Fix response structure
* make gen
* feat(api): Add agent shutdown lifecycle states
* feat(agent): Add shutdown_script support
* feat(agent): Add shutdown_script timeout
* feat(site): Support new agent lifecycle states
---
Co-authored-by: Marcin Tojek <marcin@coder.com>
* chore: convert workspace agent stats from json to table
* chore: convert agent stats to use a table
Backwards compatibility becomes hard when all agent stats are in a JSON blob.
We also want to query this table for new agents that are failing health checks
so we can display it in the UI.
* Fix migration using default values
* Add git auth providers schema
* Pipe git auth providers to the schema
* Add git auth providers to the API
* Add gitauth endpoint to query authenticated state
* Add endpoint to query git state
* Use BroadcastChannel to automatically authenticate with Git
* Add error validation for submitting the create workspace form
* Fix panic on template dry-run
* Add tests for the template version Git auth endpoint
* Show error if no gitauth is configured
* Add gitauth to cliui
* Fix unused method receiver
* Fix linting errors
* Fix dbauthz querier test
* Fix make gen
* Add JavaScript test for git auth
* Fix bad error message
* Fix provisionerd test race
See https://github.com/coder/coder/actions/runs/4277960646/jobs/7447232814
* Fix requested changes
* Add comment to CreateWorkspacePageView
- rbac: export rbac.Permissions
- dbauthz: move GetDeploymentDAUs, GetTemplateDAUs,
GetTemplateAverageBuildTime from querier.go to system.go
and removes auth checks
- dbauthz: remove AsSystem(), add individual roles for
autostart, provisionerd, add restricted system role for
everything else
feat: Add initial AuthzQuerier implementation
- Adds package database/dbauthz that adds a database.Store implementation where each method goes through AuthZ checks
- Implements all database.Store methods on AuthzQuerier
- Updates and fixes unit tests where required
- Updates coderd initialization to use AuthzQuerier if codersdk.ExperimentAuthzQuerier is enabled
* chore: rename `AgentConn` to `WorkspaceAgentConn`
The codersdk was becoming bloated with consts for the workspace
agent that made no sense to a reader. `Tailnet*` is an example
of these consts.
* chore: remove `Get` prefix from *Client functions
* chore: remove `BypassRatelimits` option in `codersdk.Client`
It feels wrong to have this as a direct option because it's so infrequently
needed by API callers. It's better to directly modify headers in the two
places that we actually use it.
* Merge `appearance.go` and `buildinfo.go` into `deployment.go`
* Merge `experiments.go` and `features.go` into `deployment.go`
* Fix `make gen` referencing old type names
* Merge `error.go` into `client.go`
`codersdk.Response` lived in `error.go`, which is wrong.
* chore: refactor workspace agent functions into agentsdk
It was odd conflating the codersdk that clients should use
with functions that only the agent should use. This separates
them into two SDKs that are closely coupled, but separate.
* Merge `insights.go` into `deployment.go`
* Merge `organizationmember.go` into `organizations.go`
* Merge `quota.go` into `workspaces.go`
* Rename `sse.go` to `serversentevents.go`
* Rename `codersdk.WorkspaceAppHostResponse` to `codersdk.AppHostResponse`
* Format `.vscode/settings.json`
* Fix outdated naming in `api.ts`
* Fix app host response
* Fix unsupported type
* Fix imported type
If an agent went away and reconnected, the wsconncache connection would
be polluted for about 10m because there would be two peers with the
same IP. The old peer always had priority, which caused the dashboard to
try and always dial the old peer until it was removed.
Fixes: https://github.com/coder/coder/issues/5292