24 Commits

Author SHA1 Message Date
7dffd0eee3 Allow compaction disable per tenant (#3965)
* Allow compaction disable per tenant

* Update mock

* Rename legacy yaml key

* Rename methods and fields for clarity about disablement

* Rename methods and fields for clarity about disablement

* Update changelog
2024-08-15 17:05:28 +00:00
e33d407c5d Native histograms processor replacement enum (#3938)
* Add enum to registry for allowed histogram modes

* Update overrides and generator for new enum values

* Update servicegraph tests for signature

* Update generator tests for interface change

* Update generator storage tests for interface change

* Update generator storage tests for interface change

* Drop signature update

* Update validation for enum lookup

* Update test for signature change

* Fix servicegraphs signature

* Fix default validation when empty

* Remove unused function

* Set default value for generator config to avoid empty checking enum
2024-08-14 19:13:03 +00:00
7615301906 Begin implementation for native histograms (#3789)
* Initial setup native histograms

* Bump prometheus dependencies, map exemplars over

* Add GenerateNativeHistograms from legacy

* Add test coverage for legacy overrides

* Plumb overrides into remote write config generation and test

* Lint for unused vars and duplicate imports

* Map 'classic' histograms out of prom.Histogram

* More tweaking to get classic histograms working, not there yet though :(

* Lint increment

* Lint increment

* Refactor native histogram tests

* Track and reset the buckets for which exemplars have been recorded

* Split multiplier test into integer and floating point

* Fix expectedSeriesCount and examplars in test

* Drop expectedSeriesCount and just len of expectedSamples instead

* Reduce log spam

* Apply existing interface constraint to Gauge and Counter

* Update test instances for metric interface

* Use int64 in place of atomic since always under lock

* Lint for interface updates

* Drop series[0] check

* Convert override from bool to string and test values

* Push mode func into native histograms implementation for update handling

* Drop unused variable

* Set --enable-feature=native-histograms on all prometheus docker-compose setups

* Update generate_native_histograms setting

---------

Co-authored-by: Koenraad Verheyden <koenraad.verheyden@grafana.com>
Co-authored-by: yuna <yuna.verheyden@posteo.net>
2024-07-11 16:03:26 +00:00
35aa72e692 Add new histogram to generator - messaging system latency (#3453)
* introduce new service-graph metric for messaging-system latency

* added tests for new histogram values

* fix linting

* make new metric optional via config

* fix typo

* fix failing tests

* add feature to changelog

* negative times diff consistency - return 0 instead of negative

* update docs

* Update docs/sources/tempo/metrics-generator/service_graphs/estimate-cardinality.md

use present when possible

Co-authored-by: Kim Nylander <104772500+knylander-grafana@users.noreply.github.com>

* change 1e9 to time const

* added a reference to the "wait" config of the processor

* fixed indentations and formatting stuff from rebasing

* removed mistaken println found by linter

---------

Co-authored-by: Kim Nylander <104772500+knylander-grafana@users.noreply.github.com>
2024-05-03 12:49:23 -04:00
251bf5a9a7 Surface new labels for uninstrumented services and systems (#3543)
* Surface new labels for uninstrumented services and systems

* Update CHANGELOG.md

* remove unnecessary Println in test

* Reuse dimensions and prefixes for this use-case

* Add docs

* keep only virtual_node behind the new feature

* add overrides

* Update docs/sources/tempo/metrics-generator/service_graphs/_index.md

Co-authored-by: Kim Nylander <104772500+knylander-grafana@users.noreply.github.com>

* forgot this test (fixed)

* add benchmarks

* add edge pooling/reuse

* update flag

* move label out of dimensions to avoid prefix logic

* lint

* Update modules/generator/processor/servicegraphs/config.go

Co-authored-by: Mario <mariorvinas@gmail.com>

* minor amends to names and docs

* leave the new virtual_node label as an extra dimension

* keep edge sync.Pool ops inside store

* Update modules/generator/processor/servicegraphs/store/store.go

The edge is not expired here, so it shouldn't be returned to the pool.

Co-authored-by: Mario <mariorvinas@gmail.com>

* leave the new label un-prefixed

---------

Co-authored-by: Kim Nylander <104772500+knylander-grafana@users.noreply.github.com>
Co-authored-by: Mario <mariorvinas@gmail.com>
2024-04-30 16:53:05 -04:00
93d88ff09d Return rate limit error according to ingestion rate strategy (#3485)
* print limit according to ingestion rate strategy

* clean

* overrides always return default overrides for ingestion rate strategy

* remove test logs

* babo

* add both local and global limits
2024-04-09 13:53:01 -05:00
89d7c9e587 Add ingester ring shuffle sharding (#3506)
* Add ingester ring shuffle sharding

* Add new override to legacy config

* Add validation, shuffle_sharding_lookback_period -> shuffle_sharding_ingesters_lookback_period

* Actually implement the test 🙃

* Drop validator field

* Fix compilation issues

* Fix compilation issues
2024-03-29 17:36:56 +00:00
f34a137131 [TraceQL Metrics] Use new per-tenant max_metrics_duration and fix duration check (#3484)
* Use new per-tenant max_metrics_duration, and fix duration timestamp handling

* Update docs and defaults
2024-03-14 08:55:19 -04:00
8079874157 [Traceql metrics] New (unsafe) query hints (#3396)
* Update traceql metrics to use the trace-level timestamp columns conditionally

* comments

* Update benchmark, comment

* lint

* Change overlap cutoff to 20%

* add more instrumentation and a little cleanup

* Add many new hints, with the unsafe hints enabled by per-tenant flag

* Collect all hints in one place, decomplicate code

* Make func signature golang convention

* make querier block_concurrency configurable

* changelog

* Make time_overlap_cutoff configurable. Rename block_concurrency to concurrent_blocks to match naming convention of other concurrent_* fields.

* fix test
2024-03-05 07:26:58 -05:00
e9afdbb198 Add HTML pages to view tenants with overrides (#3332)
* Add HTML pages to view tenants with overrides

* Update CHANGELOG.md

* Update docs

* Linting and fmt

* Clean

* Simplify (?) HTTP handlers, remove handlers from Interface

* Address review comments

* Specify source of runtime overrides
2024-02-01 16:53:20 +01:00
7a65cc0b13 Add configurable remote_write headers to metrics-generator (#3175)
* Add configurable remote_write headers to metrics-generator

* Add test

* chlog

* Fix test

* Fix registerer panic

* Add metric

* Support variable expansion in overrides config

* Another chlog entry

* Minor changes

* Reset Prometheus registry in tests
2024-01-30 13:40:35 +00:00
552934fb71 Add /status/overrides/{tenant} endpoint (#3244)
* Add /status/overrides/{tenant} endpoint

* Add /status/overrides as alias of /status/runtime_config

* Add tests for getRuntimeOverridesFor

* make fmt
2024-01-19 16:09:25 +00:00
85c021b0d3 make the traceID label name configurable (#3074)
* make the traceID label name configurable, because otel specifies trace_id - see https://opentelemetry.io/docs/specs/otel/compatibility/prometheus_and_openmetrics/#exemplars

* make the traceID label name configurable, because otel specifies trace_id - see https://opentelemetry.io/docs/specs/otel/compatibility/prometheus_and_openmetrics/#exemplars

* make the traceID label name configurable, because otel specifies trace_id - see https://opentelemetry.io/docs/specs/otel/compatibility/prometheus_and_openmetrics/#exemplars

* make the traceID label name configurable, because otel specifies trace_id - see https://opentelemetry.io/docs/specs/otel/compatibility/prometheus_and_openmetrics/#exemplars

* make the traceID label name configurable, because otel specifies trace_id - see https://opentelemetry.io/docs/specs/otel/compatibility/prometheus_and_openmetrics/#exemplars

* make the traceID label name configurable, because otel specifies trace_id - see https://opentelemetry.io/docs/specs/otel/compatibility/prometheus_and_openmetrics/#exemplars
2023-11-17 18:04:30 +01:00
5e7023521e Add per-tenant compaction window (#3129)
* Add support for per-tenant compaction window

* Update legacy round trip

* Metric the override config value

* Drop comment

* Use default config window in case no override for tenant

* Add docs for compaction_window

* Add test for round-trip

* Drop metric for default overrides compaction window

* Update changelog
2023-11-14 13:43:06 +00:00
b3a45e28df Update dskit and grpc (#3096)
* Update grafana/dskit and google/grpc

* Updates for dskit usage

* Update serverless go.mod

* Avoid localhost name lookup in memberlist test

* Update serverless go.mod for grpc
2023-10-31 20:58:19 +00:00
1e004aca08 Use the standard errors package to create, wrap, and handle errors (#2955)
* Standardise error formatting using standard library errors package

* Standardise improve non-idiomatic error messages

* Make github.com/pkg/errors an indirect dependency

Co-authored-by: A. Stoewer <adrian@stoewer.me>
2023-09-26 17:39:47 +10:00
a5744e2907 Fix metrics generator ingestion slack naming to match new overrides configurations (#2844)
* testing

* undo logs

* update labeling for metrics ingestion slack with new overrides

* remove test log
2023-08-24 19:13:15 -05:00
bfcd75c255 Make metrics-generator ingestion slack per tenant (#2589)
* testing

* undo logs

* make mg ingestion slack per tenant

* changelog

* fix label and updated docs

* move mg slack to update processor

* store duration correctly

* oops

* rebase

* make it spicy level: atomic

* use uber/atomic

* up up and away

* undo limits.go
2023-08-24 17:20:10 -05:00
3d8803576c Overrides module refactor (#2688)
* Extract configuration out of overrides.Limits into dedicated a Config struct

* Move default limits under default_limits key

* Fix compile error

* Fix test

* Respect default settings when unmarshalling config

* Change overrides to ident format

* Add tempo-cli command to migrate configs

* Some more tweaks

* Fixes

* Rename limit to override

* Add missing test

* Address review comments

* Add changelog entry

* Update docs and example configs

* typo

---------

Co-authored-by: Koenraad Verheyden <koenraad.verheyden@grafana.com>
2023-08-24 10:01:44 +00:00
f34ee8686a add config to drop certain labels from target info (#2510)
* testing

* undo logs

* add config to drop certain labels from target info

* comment and refactor

* documentation

* hopefully improve perf

* rebase

* lint

* oops

* lint
2023-07-28 15:03:40 -05:00
d13fccf3d7 Add several metrics-generator fields to user-configurable overrides (#2711)
* Add metrics_generator fields to user-configurable overrides

* Update CHANGELOG.md

* Add metrics_generator.processor.service_graph.enable_client_server_prefix

* Add interface checks for explicitness
2023-07-28 17:34:32 +02:00
bcc79249f6 [vParquet3] new block encoding with support for dedicated columns (#2649)
* [vParquet3] create new block encoding by copying vParquet2

* vParquet3: add dedicated columns to parquet schema and block meta (#2517)

* Re-order schema to keep columns affected by column index changes low

* Add spare columns for dedicated attributes to schema struct

* Add dedicated column config to block meta

* Read and write attributes in dedicated columns

* Make order of dedicated attributes predictable when reading

* Fix existing tests and benchmark

* Run exiting benchmarks and tests with dedicated columns

Co-authored-by: Mario <mariorvinas@gmail.com>

* Add dedicated columns to overrides module (#2551)

* [vParquet3] Write path (#2555)

* Add dedicated columns to overrides and blocks

* Improvements

* Change test

* Fix tests

* Extend ingester_test:

* Add dedicated columns config to storage block

* Review comments

* Add comment

* [vParquet3] dedicated columns read path (#2592)

* Refactor and rename function blockMetaToDedicatedColumnMapping

* Query dedicated attribute columns with TraceQL

* Search tag values in dedicated attribute columns

* Search tags in dedicated attribute columns

* Search for values in dedicated attribute columns in tests

* More consistent naming

* Update block and meta.json in vparquet2/test-data

* Test dedicated column in traceToParquet test

* Format Go code

* Introduce types for dedicated column type and scope

Replace StaticTypeFromString() with DedicatedColumnType.ToStaticType()

* The function dedicatedColumnsToColumnMapping() can receive multiple scopes

* [vParquet3] Add support for dedicated columns in compactor (#2561)

* Re-order schema to keep columns affected by column index changes low

* Add spare columns for dedicated attributes to schema struct

* Add dedicated column config to block meta

* Read and write attributes in dedicated columns

* Make order of dedicated attributes predictable when reading

* Fix existing tests and benchmark

* Run exiting benchmarks and tests with dedicated columns

* Add dedicated columns to overrides and blocks

* Support dedicated columns in compactor block selection

* Changes to hash

* More tests

---------

Co-authored-by: A. Stoewer <adrian@stoewer.me>

* [vParquet3] pass dedicated columns to querier (#2603)

* Add dedicated columns to SearchBlockRequest message

* Assign SearchBlockRequest dedicated cols from BlockMeta and vice versa

* Encode SearchBlockRequest to http request and vice versa

* Don't add empty dedicated columns when building a search request

* Unit tests with dedicated columns

* Implement dedicated column scope and type as protobuf enums

* [vParquet3] validate dedicated columns configuration (#2616)

* Add validate function

* Refactor: use DedicatedColumns type instead of []DedicatedColumn

* Initialize logger before verifying the config

This fixes the config verification output

* Check for invalid dedicated columns with '-config.verify true'

* Use ToTempopb() to validate dedicated column scope and type

* [vParquet3] mention feature in CHANGELOG.md

* [vParquet3] Address review comments

* Remove TODO comment about caching the dedicated column hash

* Shorten url param for dedicated columns to 'dc'

* Add function to get latest encoding and use it in tests

* Fix name DedicateColumnsFromTempopb

* [vParquet3] Address more review comments

* Remove 'Test' columns from vParquet3 schema

* Rename async iterator environment variable

* Do not export methods of dedicatedColumnMapping

* Skip dedicated attribute lookup depending on scope in searchTagValues

* Validate maximum number of configured dedicated columns

* Test data for vparquet3 uses dedicated columns

* Reduce size of block meta JSON

* Use 'parquet_' prefix for dedicated column configuration

* [vParquet3] Integration tests with dedicated attribute columns

* Add e2e tests for encodings and dedicated attribute columns

* Use dedicated attribute columns in TestSearchCompleteBlock

* Add support for v2 in encodings test

---------

Co-authored-by: Mario <mariorvinas@gmail.com>
2023-07-26 17:30:25 +00:00
a5a7adb5cd Run gofumpt -w on all go files and integrate with CI (#2584)
* Include gofumpt and goimports in tools

* Replace gofmt with gofumpt in Makefile

* Run `make fmt`

* Adhere to goconst lint rule

* Include gofumpt note in the CONTRIBUTING.md

* Update CHANGELOG
2023-07-20 16:02:05 +00:00
3b618cc343 Add user-configurable overrides module (#2543)
* Add user-configurable overrides module

* Add /api/overrides and fix crash on boot

* Add overridesHandler and WriteStatusRuntimeConfig

* return json and only return overrides for the tenant

* Implement delete

* Fix test I think?

* Fix tests

* clean up handler and TODOs

* Add tests for overridesHandler

* Add e2e test

* Refactor:
- clean up integration tests for overrides
- rename ReloadInterval -> PollInterval
- linting

* Linting

* address more Linting

* fix lint and add test for PATCH

* fix lint error unparam

* remove todo

* use tenantLimits as return type

* Add prometheus.Collector to overrides.Interface

* Test tempo_overrides_user_configurable_overrides_fetch_total metric in e2e tests

* Sprinkle in some tracing

* Update CHANGELOG.md

* Rename loop to running for consistency

* Have mux handle method routing; split up GET, POST and DELETE handlers

* Use built in contains

* Split up user-configurable overrides manager, api and backend client

* Move overrides API to httpclient

* Clean up, linting, fmt

* Remove version field from json

* Address review comments

* If overrides.json does not exist, properly delete it from cache

* Add config warning for conflicting storage

* Check in my tests as well

* Simplify API handler, return 404 on overrides not found

* Typo, linting, fix test

* Use backend constants

---------

Co-authored-by: Suraj Nath <9503187+electron0zero@users.noreply.github.com>
2023-07-18 12:52:01 +02:00