267 Commits

Author SHA1 Message Date
259b765971 [rhythm] Add partition lag alerts for Kafka consumers (#4830)
* Add partition lag alerts for Kafka consumers

* Dont filter by group

* fmt and compile

* Add configurable group filter for partition lag alerts
2025-03-11 12:38:08 +01:00
291eb0a783 Remove TempoRequestLatency alert and associated runbook section (#4768) 2025-02-28 08:17:06 -05:00
52a0e16966 Issue 3880: fix mixin issues (#4757)
* Issue 3880: TempoBlockListRisingQuickly fix

* Issue 3880: TempoBlockListRisingQuickly fix: generate

* Issue 3880: New alerts

For unhealthy Ingester and metrics-generators

* Issue 3880: New alerts: generate

* Issue 3880: Runbook for new alerts

* Issue 3880: Better descriptions for runbook

Co-authored-by: Suraj Nath <9503187+electron0zero@users.noreply.github.com>

* Issue 3880: Increase severity

for unhealthy metrics-generator and ingester

* Issue 3880: Increase severity: generate

---------

Co-authored-by: Suraj Nath <9503187+electron0zero@users.noreply.github.com>
2025-02-26 13:19:28 +01:00
fc828331c8 Remove outdated info from the runbook (#4733) 2025-02-21 10:42:23 -05:00
842da4a2ed [rhythm] Add rollout-operator and allow simultaneous update of block-builder pods (#4660)
* Add rollout-operator

* Merge multi-zone operator and add more functionality

* Bump version

* Automatically configure enable_groups

* fmt
2025-02-10 13:13:54 +01:00
d7b16550b4 rythm: add block builder to resources dashboard (#4669)
* rythm: add block builder to resources dashboard

* changelog

* fmt jsonnet
2025-02-07 15:03:12 +01:00
51aca06f9f Remove tempo serverless (#4599)
* remove tempo-serverless from cmd

* cleanup Makefile and .gitignore

* remove serverless code from pkg/api/

* remove serverless code from the querier

* clean up serverless from tempo-mixin and playbook

* Remove serverless from tempo docs

* Update changelog

* go mod vendor

* Remove tempo_feature_enabled metric

* inline internalSearchBlock

* docs alias
2025-01-24 05:02:29 +00:00
55c71fa90f Ops: Fix envoy writes dash (#4604)
* Fix envoy writes dash

Signed-off-by: Joe Elliott <number101010@gmail.com>

* fix generated dash

Signed-off-by: Joe Elliott <number101010@gmail.com>

---------

Signed-off-by: Joe Elliott <number101010@gmail.com>
2025-01-23 20:59:12 +00:00
094a9fddb8 Update tempo operational dashboard for block builder and v2 traces api (#4559)
* Update tempo operational dashboard for block builder and v2 traces api

* changelog
2025-01-15 17:49:05 +00:00
e709f8ac70 [rhythm] Introduce block-builder and kafka ingest path (#4533)
* Block-builder PoC

* Add unit test for block-builder (#4289)

* Add unit test for block-builder

* fmt

* Update tests

* cmon

* Deterministically build blocks for partition sections (#4327)

* Pull main (#4342)

* chore: remove gofakeit dependency (#4274)

* Further reduce Labes() calls in the metrics registry (#4283)

* Respect passed headers in read path requests (#4287)

* Ingester: Validate completed blocks (#4256)

* Add validate method to block

Signed-off-by: Joe Elliott <number101010@gmail.com>

* Add Validate usage in the ingester

Signed-off-by: Joe Elliott <number101010@gmail.com>

* changelog

Signed-off-by: Joe Elliott <number101010@gmail.com>

* add test and fix replay

Signed-off-by: Joe Elliott <number101010@gmail.com>

* increment metric

Signed-off-by: Joe Elliott <number101010@gmail.com>

---------

Signed-off-by: Joe Elliott <number101010@gmail.com>

* Add `invalid_utf8` to reasons spans could be rejected (#4293)

* Add `invalid_utf8` to reasons spans could be rejected

* Update changelog

* Update docs

* Ensure test covers invalid UTF-8 and not slack time

* add signals for duplicate rf1 data (#4296)

Signed-off-by: Joe Elliott <number101010@gmail.com>

* Bump anchore/sbom-action from 0.17.5 to 0.17.7 (#4307)

Bumps [anchore/sbom-action](https://github.com/anchore/sbom-action) from 0.17.5 to 0.17.7.
- [Release notes](https://github.com/anchore/sbom-action/releases)
- [Changelog](https://github.com/anchore/sbom-action/blob/main/RELEASE.md)
- [Commits](https://github.com/anchore/sbom-action/compare/v0.17.5...v0.17.7)

---
updated-dependencies:
- dependency-name: anchore/sbom-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* docs: Update readme with explore traces info (#4263)

* docs: Update readme with explore traces info

Co-authored-by: Kim Nylander <104772500+knylander-grafana@users.noreply.github.com>

* chore: remove spanlogger (#4312)

* chore: remove spanlogger

* Query-Frontend: Add middleware to drop headers (#4298)

* header strip ware

Signed-off-by: Joe Elliott <number101010@gmail.com>

* comment

Signed-off-by: Joe Elliott <number101010@gmail.com>

* changelog

Signed-off-by: Joe Elliott <number101010@gmail.com>

* remove header strip wear from metrics summary

Signed-off-by: Joe Elliott <number101010@gmail.com>

---------

Signed-off-by: Joe Elliott <number101010@gmail.com>

* Increase length of time compactions have to fail (#4315)

* increase length of time compactions have to fail

Signed-off-by: Joe Elliott <number101010@gmail.com>

* gen

Signed-off-by: Joe Elliott <number101010@gmail.com>

---------

Signed-off-by: Joe Elliott <number101010@gmail.com>

* docs: mark serverless as deprecated (#4017)

* docs: mark serverless as deprecated

* Changelog + readme

* docs: Remove duplicated examples (#4295)

This removes duplicates examples from the Configure TraceQL
metrics page.

Signed-off-by: Alex Bikfalvi <alex.bikfalvi@grafana.com>

* tempo-cli: support dropping multiple traces in a single operation (#4266)

* tempo-cli: support dropping multiple traces in a single operation

* update final log message

---------

Co-authored-by: Suraj Nath <9503187+electron0zero@users.noreply.github.com>

* [DOC] Add clarification for metrics summary and traceQL metrics (#4316)

* Add clarification for metrics summary and traceQL metrics

* Apply suggestions from code review

Co-authored-by: Jennifer Villa <jvilla2013@gmail.com>

* Update docs/sources/tempo/api_docs/metrics-summary.md

---------

Co-authored-by: Jennifer Villa <jvilla2013@gmail.com>

* TraceQL metrics time range fixes (#4325)

* Disconnect job time range filtering from step, so that results in split backend/recent range is accurate

* changelog

* Fix to assert metrics query range before alignment because alignment may increase it, which is not the responsibility of the caller to account for (#4331)

* Add doc about configuring TLS with Helm (#4328)

* Add doc about configuring TLS with Helm

* Add memberlist and readinessProbe to example

* Include server config for listening on TLS

* Add note about scraping

* Update docs/sources/tempo/configuration/network/tls.md

Co-authored-by: Markus Toivonen <markus.toivonen@hoxhunt.com>

* Update docs/sources/tempo/configuration/network/tls.md

Co-authored-by: Kim Nylander <104772500+knylander-grafana@users.noreply.github.com>

* Update docs/sources/tempo/configuration/network/tls.md

Co-authored-by: Kim Nylander <104772500+knylander-grafana@users.noreply.github.com>

* Add memcached config for TLS

---------

Co-authored-by: Markus Toivonen <markus.toivonen@hoxhunt.com>
Co-authored-by: Kim Nylander <104772500+knylander-grafana@users.noreply.github.com>

* [DOC] Add TLS info to Helm chart doc (#4334)

---------

Signed-off-by: Joe Elliott <number101010@gmail.com>
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Alex Bikfalvi <alex.bikfalvi@grafana.com>
Co-authored-by: Javier Molina Reyes <javiermolinar@live.com>
Co-authored-by: Zach Leslie <zach.leslie@grafana.com>
Co-authored-by: Joe Elliott <number101010@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Ryan Perry <Rperry2174@gmail.com>
Co-authored-by: Kim Nylander <104772500+knylander-grafana@users.noreply.github.com>
Co-authored-by: Suraj Nath <9503187+electron0zero@users.noreply.github.com>
Co-authored-by: Alex Bikfalvi <alex@bikfalvi.com>
Co-authored-by: Andrey Karpov <ndk@users.noreply.github.com>
Co-authored-by: Jennifer Villa <jvilla2013@gmail.com>
Co-authored-by: Martin Disibio <martin.disibio@grafana.com>
Co-authored-by: Markus Toivonen <markus.toivonen@hoxhunt.com>

* WIP: Rhythm ingest path (#4314)

* Validate distributor config. Finish encoder/decoder tests

* Repair tests

* Make SingleBinary work out of the box by defaulting to partition 0

* Fix first time startup where blockbuilder fails before ingester can create topic

* Fix initial startup cycle time and delay

* Add more failure modes to the block-builder (#4345)

* Add more tests to the block-builder

* stuff

* Add comments

* [Rhythm] Metrics generator read from kafka first pass (#4359)

* Metrics generator read from kafka first pass

* review feedback

* Multiple fixes in block-builder (#4364)

* [rhythm] git merge origin/main (#4376)

* chore: remove gofakeit dependency (#4274)

* Further reduce Labes() calls in the metrics registry (#4283)

* Respect passed headers in read path requests (#4287)

* Ingester: Validate completed blocks (#4256)

* Add validate method to block

Signed-off-by: Joe Elliott <number101010@gmail.com>

* Add Validate usage in the ingester

Signed-off-by: Joe Elliott <number101010@gmail.com>

* changelog

Signed-off-by: Joe Elliott <number101010@gmail.com>

* add test and fix replay

Signed-off-by: Joe Elliott <number101010@gmail.com>

* increment metric

Signed-off-by: Joe Elliott <number101010@gmail.com>

---------

Signed-off-by: Joe Elliott <number101010@gmail.com>

* Add `invalid_utf8` to reasons spans could be rejected (#4293)

* Add `invalid_utf8` to reasons spans could be rejected

* Update changelog

* Update docs

* Ensure test covers invalid UTF-8 and not slack time

* add signals for duplicate rf1 data (#4296)

Signed-off-by: Joe Elliott <number101010@gmail.com>

* Bump anchore/sbom-action from 0.17.5 to 0.17.7 (#4307)

Bumps [anchore/sbom-action](https://github.com/anchore/sbom-action) from 0.17.5 to 0.17.7.
- [Release notes](https://github.com/anchore/sbom-action/releases)
- [Changelog](https://github.com/anchore/sbom-action/blob/main/RELEASE.md)
- [Commits](https://github.com/anchore/sbom-action/compare/v0.17.5...v0.17.7)

---
updated-dependencies:
- dependency-name: anchore/sbom-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* docs: Update readme with explore traces info (#4263)

* docs: Update readme with explore traces info

Co-authored-by: Kim Nylander <104772500+knylander-grafana@users.noreply.github.com>

* chore: remove spanlogger (#4312)

* chore: remove spanlogger

* Query-Frontend: Add middleware to drop headers (#4298)

* header strip ware

Signed-off-by: Joe Elliott <number101010@gmail.com>

* comment

Signed-off-by: Joe Elliott <number101010@gmail.com>

* changelog

Signed-off-by: Joe Elliott <number101010@gmail.com>

* remove header strip wear from metrics summary

Signed-off-by: Joe Elliott <number101010@gmail.com>

---------

Signed-off-by: Joe Elliott <number101010@gmail.com>

* Increase length of time compactions have to fail (#4315)

* increase length of time compactions have to fail

Signed-off-by: Joe Elliott <number101010@gmail.com>

* gen

Signed-off-by: Joe Elliott <number101010@gmail.com>

---------

Signed-off-by: Joe Elliott <number101010@gmail.com>

* docs: mark serverless as deprecated (#4017)

* docs: mark serverless as deprecated

* Changelog + readme

* docs: Remove duplicated examples (#4295)

This removes duplicates examples from the Configure TraceQL
metrics page.

Signed-off-by: Alex Bikfalvi <alex.bikfalvi@grafana.com>

* tempo-cli: support dropping multiple traces in a single operation (#4266)

* tempo-cli: support dropping multiple traces in a single operation

* update final log message

---------

Co-authored-by: Suraj Nath <9503187+electron0zero@users.noreply.github.com>

* [DOC] Add clarification for metrics summary and traceQL metrics (#4316)

* Add clarification for metrics summary and traceQL metrics

* Apply suggestions from code review

Co-authored-by: Jennifer Villa <jvilla2013@gmail.com>

* Update docs/sources/tempo/api_docs/metrics-summary.md

---------

Co-authored-by: Jennifer Villa <jvilla2013@gmail.com>

* TraceQL metrics time range fixes (#4325)

* Disconnect job time range filtering from step, so that results in split backend/recent range is accurate

* changelog

* Fix to assert metrics query range before alignment because alignment may increase it, which is not the responsibility of the caller to account for (#4331)

* Add doc about configuring TLS with Helm (#4328)

* Add doc about configuring TLS with Helm

* Add memberlist and readinessProbe to example

* Include server config for listening on TLS

* Add note about scraping

* Update docs/sources/tempo/configuration/network/tls.md

Co-authored-by: Markus Toivonen <markus.toivonen@hoxhunt.com>

* Update docs/sources/tempo/configuration/network/tls.md

Co-authored-by: Kim Nylander <104772500+knylander-grafana@users.noreply.github.com>

* Update docs/sources/tempo/configuration/network/tls.md

Co-authored-by: Kim Nylander <104772500+knylander-grafana@users.noreply.github.com>

* Add memcached config for TLS

---------

Co-authored-by: Markus Toivonen <markus.toivonen@hoxhunt.com>
Co-authored-by: Kim Nylander <104772500+knylander-grafana@users.noreply.github.com>

* [DOC] Add TLS info to Helm chart doc (#4334)

* fix deprecation warning by switching to DoBatchWithOptions (#4343)

Signed-off-by: Daniel Strobusch <1847260+dastrobu@users.noreply.github.com>

* bump dskit to v0.0.0-20241115082728-f2a7eb3aa0e9 to leverage benefits for context causes for DoBatch calls. (#4341)

See https://github.com/grafana/dskit/issues/576

Signed-off-by: Daniel Strobusch <1847260+dastrobu@users.noreply.github.com>

* Bump github.com/minio/minio-go/v7 from 7.0.70 to 7.0.80 (#4282)

* Bump github.com/minio/minio-go/v7 from 7.0.70 to 7.0.80

Bumps [github.com/minio/minio-go/v7](https://github.com/minio/minio-go) from 7.0.70 to 7.0.80.
- [Release notes](https://github.com/minio/minio-go/releases)
- [Commits](https://github.com/minio/minio-go/compare/v7.0.70...v7.0.80)

---
updated-dependencies:
- dependency-name: github.com/minio/minio-go/v7
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* Update serverless vendor

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Zach Leslie <zach.leslie@grafana.com>

* update default config values to better align with production workloads (#4340)

* update default config values to better align with production workloads

* Update CHANGELOG.md and config docs

* Ingester memory improvements by adjusting prealloc (#4344)

* remove trace ids

Signed-off-by: Joe Elliott <number101010@gmail.com>

* linear buckets

Signed-off-by: Joe Elliott <number101010@gmail.com>

* changelog

Signed-off-by: Joe Elliott <number101010@gmail.com>

* tuney tune

Signed-off-by: Joe Elliott <number101010@gmail.com>

* metric misses and increase pool size

Signed-off-by: Joe Elliott <number101010@gmail.com>

* lint

Signed-off-by: Joe Elliott <number101010@gmail.com>

---------

Signed-off-by: Joe Elliott <number101010@gmail.com>

* Bump github.com/Azure/azure-sdk-for-go/sdk/azcore from 1.13.0 to 1.16.0 (#4302)

* Bump github.com/Azure/azure-sdk-for-go/sdk/azcore from 1.13.0 to 1.16.0

Bumps [github.com/Azure/azure-sdk-for-go/sdk/azcore](https://github.com/Azure/azure-sdk-for-go) from 1.13.0 to 1.16.0.
- [Release notes](https://github.com/Azure/azure-sdk-for-go/releases)
- [Changelog](https://github.com/Azure/azure-sdk-for-go/blob/main/documentation/release.md)
- [Commits](https://github.com/Azure/azure-sdk-for-go/compare/sdk/azcore/v1.13.0...sdk/azcore/v1.16.0)

---
updated-dependencies:
- dependency-name: github.com/Azure/azure-sdk-for-go/sdk/azcore
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* Update serverless vendor

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Zach Leslie <zach.leslie@grafana.com>

* Use Prometheus fast regexp (#4329)

* basic integration

Signed-off-by: Joe Elliott <number101010@gmail.com>

* patch tests for new meaning

Signed-off-by: Joe Elliott <number101010@gmail.com>

* patch up more tests

Signed-off-by: Joe Elliott <number101010@gmail.com>

* add basic tests

Signed-off-by: Joe Elliott <number101010@gmail.com>

* changelog + docs

Signed-off-by: Joe Elliott <number101010@gmail.com>

* remove benches

Signed-off-by: Joe Elliott <number101010@gmail.com>

* Cleaned up + tests

Signed-off-by: Joe Elliott <number101010@gmail.com>

* comment

Signed-off-by: Joe Elliott <number101010@gmail.com>

* lint

Signed-off-by: Joe Elliott <number101010@gmail.com>

* Update docs/sources/tempo/traceql/_index.md

Co-authored-by: Kim Nylander <104772500+knylander-grafana@users.noreply.github.com>

* comment

Signed-off-by: Joe Elliott <number101010@gmail.com>

---------

Signed-off-by: Joe Elliott <number101010@gmail.com>
Co-authored-by: Kim Nylander <104772500+knylander-grafana@users.noreply.github.com>

* Fix broken link in service-graphs docs (#4351)

* Fix minor typo in TraceQL docs (#4356)

* Bump default memcached version (#4363)

* Exemplar fixes (#4366)

* Fix exemplars based on duration to convert to seconds, fix various other issues

* changelog

* fix: initialize histogram buckets to 0 to avoid them being downsampled (#4368)

* initialized histogram buckets to 0 to avoid them being downsampled

* Ingester/Generator Live trace cleanup (#4365)

* moved trace sizes somewhere shareable

Signed-off-by: Joe Elliott <number101010@gmail.com>

* use tracesizes in ingester

Signed-off-by: Joe Elliott <number101010@gmail.com>

* make tests work

Signed-off-by: Joe Elliott <number101010@gmail.com>

* trace bytes in generator

Signed-off-by: Joe Elliott <number101010@gmail.com>

* remove traceCount

Signed-off-by: Joe Elliott <number101010@gmail.com>

* live trace shenanigans

Signed-off-by: Joe Elliott <number101010@gmail.com>

* changelog

Signed-off-by: Joe Elliott <number101010@gmail.com>

* Update modules/generator/processor/localblocks/livetraces.go

Co-authored-by: Mario <mariorvinas@gmail.com>

* Update modules/ingester/instance.go

Co-authored-by: Mario <mariorvinas@gmail.com>

* Test cleanup. Add sz test, restore commented out and fix e2e

Signed-off-by: Joe Elliott <number101010@gmail.com>

* remove todo comment

Signed-off-by: Joe Elliott <number101010@gmail.com>

---------

Signed-off-by: Joe Elliott <number101010@gmail.com>
Co-authored-by: Mario <mariorvinas@gmail.com>

* Bump anchore/sbom-action from 0.17.7 to 0.17.8 (#4371)

Bumps [anchore/sbom-action](https://github.com/anchore/sbom-action) from 0.17.7 to 0.17.8.
- [Release notes](https://github.com/anchore/sbom-action/releases)
- [Changelog](https://github.com/anchore/sbom-action/blob/main/RELEASE.md)
- [Commits](https://github.com/anchore/sbom-action/compare/v0.17.7...v0.17.8)

---
updated-dependencies:
- dependency-name: anchore/sbom-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Update for IDs change

* Only run blockbuilder if ingest enabled

---------

Signed-off-by: Joe Elliott <number101010@gmail.com>
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Alex Bikfalvi <alex.bikfalvi@grafana.com>
Signed-off-by: Daniel Strobusch <1847260+dastrobu@users.noreply.github.com>
Co-authored-by: Javier Molina Reyes <javiermolinar@live.com>
Co-authored-by: Zach Leslie <zach.leslie@grafana.com>
Co-authored-by: Joe Elliott <number101010@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Ryan Perry <Rperry2174@gmail.com>
Co-authored-by: Kim Nylander <104772500+knylander-grafana@users.noreply.github.com>
Co-authored-by: Suraj Nath <9503187+electron0zero@users.noreply.github.com>
Co-authored-by: Alex Bikfalvi <alex@bikfalvi.com>
Co-authored-by: Andrey Karpov <ndk@users.noreply.github.com>
Co-authored-by: Jennifer Villa <jvilla2013@gmail.com>
Co-authored-by: Martin Disibio <martin.disibio@grafana.com>
Co-authored-by: Markus Toivonen <markus.toivonen@hoxhunt.com>
Co-authored-by: Daniel Strobusch <1847260+dastrobu@users.noreply.github.com>
Co-authored-by: Carles Garcia <carles.garciacabot@grafana.com>

* [rhythm] Changes to simplify operations (#4389)

* Use mapping for assigning partitions

* Use mapping for assigning partitions in the generator too

* Add support for SASL auth to kafka clients

* Add metrics to ingest (#4395)

* [rhythm] Extract block-builder into its own module (#4396)

* Extract block-builder into its own module

* Update /operations and examples

* No ephemeral storage

* No rolling strategy either

* fmt and compile

* Address review comment

* [rhythm] Correctly pass start/end time when appending a trace (#4410)

* Correctly pass start/end times

* Different code, same result

* [rhythm] Multiple fixes to block-builder consumption (#4413)

* Multiple fixes to cycle consumption

* fmt

* happy now?

* ups

* Rhythm: Separate non-flushing local blocks processor to store new queue data for reads (#4411)

* wip: separate non-flushing local blocks processor to store new queue data for reads

* Make real config for non-flushing local blocks processor, optional, validate wal config and use defaults if needed

* Fix defaulting of second WAL config

* [rhythm] Make ID generator more robust (#4416)

* Make ID generator more robust

* Simplify

* Update to e50f5d96b

* Fix registering of kafka read client metrics (#4502)

* [rhythm] Make ID generator more robust (#4416) (#4507)

* Make ID generator more robust

* Simplify

* Removed references to Loki and Mimir (#4509)

Signed-off-by: Joe Elliott <number101010@gmail.com>

* [Rhythm] Block builder test updates (#4510)

* Make blockbuilder tests closer to real kafka and less implementation specific by always enabling support for consumer groups, call commit control func in order

* Verify last committed offset in each test

* hide test function

* lint

* lint

* [Rhythm] Block-builder consumption loop (#4480)

* Alternate block-builder consume

* Set timeout on PollFetches, reduce initial poll delay, update 1 test to work using real consumergroup functionality

* restore metrics

* Re-add original partition lag metric, polled in separate goroutine. Fix consume loop to only consume full-duration cycles for more determinism

* merge conflict

* Review feedback

* Review feedback

* Comment

* code cleanup, lint

* logs

* code cleanup

* lint

* Review feedback

* Remove missed lookback_on_no_commit config in e2e tests and regen manifest

* Review feedback

* Fix rewind to latest commit to init correctly, it didn't work in some clusters (#4532)

* [rhythm] merge main at 71e8531 (#4531)

* Fixes

* More fixes

---------

Signed-off-by: Joe Elliott <number101010@gmail.com>
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Alex Bikfalvi <alex.bikfalvi@grafana.com>
Signed-off-by: Daniel Strobusch <1847260+dastrobu@users.noreply.github.com>
Co-authored-by: Javier Molina Reyes <javiermolinar@live.com>
Co-authored-by: Zach Leslie <zach.leslie@grafana.com>
Co-authored-by: Joe Elliott <number101010@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Ryan Perry <Rperry2174@gmail.com>
Co-authored-by: Kim Nylander <104772500+knylander-grafana@users.noreply.github.com>
Co-authored-by: Suraj Nath <9503187+electron0zero@users.noreply.github.com>
Co-authored-by: Alex Bikfalvi <alex@bikfalvi.com>
Co-authored-by: Andrey Karpov <ndk@users.noreply.github.com>
Co-authored-by: Jennifer Villa <jvilla2013@gmail.com>
Co-authored-by: Martin Disibio <martin.disibio@grafana.com>
Co-authored-by: Markus Toivonen <markus.toivonen@hoxhunt.com>
Co-authored-by: Daniel Strobusch <1847260+dastrobu@users.noreply.github.com>
Co-authored-by: Carles Garcia <carles.garciacabot@grafana.com>
2025-01-10 16:05:42 +01:00
1a21818c15 Bump default memcached version (#4363) 2024-11-21 19:28:00 +00:00
29dbbd28f3 Increase length of time compactions have to fail (#4315)
* increase length of time compactions have to fail

Signed-off-by: Joe Elliott <number101010@gmail.com>

* gen

Signed-off-by: Joe Elliott <number101010@gmail.com>

---------

Signed-off-by: Joe Elliott <number101010@gmail.com>
2024-11-12 13:25:41 -05:00
dc8393794a Begin adding native histograms to operational dashboard (#4038) 2024-08-30 20:20:06 +00:00
656a584941 Include securityContext.fsGroup on jsonnet statefulset (#3866)
New PVCs may get exposed to the container with root permissions despite
the UID of the container.  Here we set the `securityContext.fsGroup` of
the statefulset which run the tempo container.
2024-07-16 14:13:48 +00:00
4655d25f06 chore: update mixin deps and add recording_rules_range_interval (#3851)
* chore: update mixin deps and add recording_rules_range_interval

* chore: bump range interval to account for scrape_interval of 1m

* chore: ran make all

* chore: added changelog entry
2024-07-11 15:52:21 +02:00
f0554c6107 Update jsonnet memcached image version (#3730)
* Update jsonnet memcached image version

* Compile jsonnet and fix image tag in make target
2024-05-31 15:31:51 +00:00
27f78c124d Update examples and docs for UID path ownership change (#3596) 2024-04-19 17:15:41 +00:00
1debcaeb60 Move jsonnetfmt, jsonnet, goimports and others install into tools/ (#2869)
* Move `jsonnetfmt` and `goimports` install into tools/

This will ensure that the version we are using to compile locally is the
same version that we are using in CI.  The `goimports` was already
included in the tools, and so the CI was redundant.  Also not, this
moves us from the C jsonnet to the Go jsonnet, which looks to introduce
slight formatting changes.

* Compile jsonnet

* Move tk, jb, and jsonnet also into tools/

* Include tools dependency on additional make targets

* Include tools Dockerfile

* Include tools image build target

* Update tools/go.sum

* Tidy up image build

* Prepare multi-arch docker image with install script

The script is used because not all imports in the tools.go are modules
which can be imported.  The version of the import is determined by
matching the nearest module path and installing the import at the
version specified in the go.mod.  This will allow updates to the go.mod
to direct the versions of the tools installed in the docker image.

* Stop ignoring the tools to allow docker build

* Include entrypoint

* Use TOOLS_CMD for docker execution on a few key make targets

* Use my image

* Fix variable

* Mark workdir as git safe

* Update drone to publish a tempo-ci-tools image

* Fix --platform for image source and use golang:alpine
2024-04-05 16:57:25 +00:00
0d3bde58b2 Update docker image to run as non-root (#2265)
* Update Tempo image to run as non-root

Signed-off-by: Zach Leslie <zach.leslie@grafana.com>

* Change ownership of /var/tempo

Signed-off-by: Zach Leslie <zach.leslie@grafana.com>

* Breadcrumbs for ingester filesystem permissions

Signed-off-by: Zach Leslie <zach.leslie@grafana.com>

* Create the directory we attempt to chown

Signed-off-by: Zach Leslie <zach.leslie@grafana.com>

* Include initcontainer and adjust statefulset security for new UID

* Compile jsonnet

* Drop securityContext since the chown handles the permissions

* Adjust test path for ownership

* Update changelog to note breaking change

* Drop extra object and include additional hardcode

* Improve language for CHANGELOG

---------

Signed-off-by: Zach Leslie <zach.leslie@grafana.com>
2024-04-04 16:43:53 +00:00
f38464313b operations: change query-frontend maxSurge to 0 for better rollouts (#3507) 2024-03-28 20:30:10 +05:30
7540d573c8 1.21 -> 1.29 (#3470)
Signed-off-by: Joe Elliott <number101010@gmail.com>
2024-03-06 16:34:56 -05:00
1883960613 [DOC] Update README.md (#3349)
* Update README.md

Add instructions on how to use the mixins.

* Update operations/tempo-mixin/README.md

* Update README.md

* Update operations/tempo-mixin/README.md

Co-authored-by: Heds Simons <hedss@users.noreply.github.com>

---------

Co-authored-by: Heds Simons <hedss@users.noreply.github.com>
2024-02-01 09:37:17 -05:00
6826589f85 Update jsonnet memcached image version for multiple CVE (#3310) 2024-01-18 17:52:34 +00:00
5343e10345 Update rate queries to use (#3160) 2023-11-21 15:19:34 +01:00
7dac4c40d2 Update IPv6 jsonnet and docs after dskit update (#3136)
* Update jsonnet inet6 listen address after dskit udpate

* Update docs inet6 listen address after dskit udpate
2023-11-13 16:41:17 +00:00
83d3835e5f Include distributor queue length in tempo-mixin (#2623)
* Include distributor queue length in tempo-mixin

* Rebuild

* Compile jsonnet

* Compile jsonnet

* Use C jsonnet to format mixin
2023-10-16 15:07:30 +00:00
12bdeff1d7 Add TempoUserConfigurableOverridesReloadFailing alert (#2784)
* Add TempoUserConfigurableOverridesReloadFailing alert

* Update CHANGELOG.md

* Fail startup on user-configurable overrides error
2023-08-15 20:57:05 +02:00
2ac7be28de microservices: use DNS lookup for memberlist.join_members by default (#2700)
* microservices: use DNS lookup for memberlsit.join_members by default

Related: https://github.com/grafana/loki/pull/9723

In high instance environments, the current default configuration will result
in a connection to each member returned by a DNS entry and result in
more connections than is necessary.

Here we include the fix that Loki has for using a `dns+` lookup feature
of memberlist to ensure that only a few members of the DNS entry are
added.

* Compile jsonnet
2023-07-26 20:16:28 +00:00
1419ce0327 [Jsonnet] Make metrics-generator deployment volume mounts customizable (#2647)
* Remove forced volume mounts for generator deployment which prevents downstream jsonnet from changing them via $.tempo_metrics_generator_container, make volume mounts idential between deployment and statefulset

* Sort gitignore and add more paths

* Fix command and regen output

* changelog
2023-07-12 17:22:20 -04:00
130de91303 Add statefulset for metrics generator (#2533)
* Add statefulset for metrics generator

* Use proper volume method

* Keep the data emptyDir on the generator deployment

* Fix config references

* Fix config default error

* Move readme file

* Zero replicas for the deployment

* Update changelog

* Update config example

* Compile jsonnet

* Update test config for microservices

* Include metrics-generator statefulset compiled manifest

* Drop termination grace grace period to rely on defaults

* Compile jsonnet
2023-07-06 17:30:59 +00:00
1f822dcc56 Update multi-zone.libsonnet (#2608) 2023-07-05 08:53:36 -04:00
4d0d7ed9a1 Include metrics-generator in Operational dashboard (#2588)
* Include metrics-generator in Operational dashboard

* Compile mixin
2023-06-26 18:25:52 +00:00
ab590108de Begin unit testing jsonnet microservices (#2229)
* Attempt unit testing jsonnet microservices

Signed-off-by: Zach Leslie <zach.leslie@grafana.com>

* Config update

* Update structure for microservices test

* Wire up CI to run the jsonnet tests

* Separate compile step, use output in first test

---------

Signed-off-by: Zach Leslie <zach.leslie@grafana.com>
2023-06-07 16:28:49 +00:00
f372d54bfe [tempo-mixin] add new alert to identify block list growth (#2542)
* [tempo-mixin] add new alert to identify block list growth

* Compile jsonnet
2023-06-06 17:20:42 +00:00
6865712c6e Sort label_values in Tempo Operational dashboard (#2541)
* Sort label_values in Tempo Operational dashboard

* Compile jsonnet

Signed-off-by: Zach Leslie <zach.leslie@grafana.com>

---------

Signed-off-by: Zach Leslie <zach.leslie@grafana.com>
2023-06-06 17:20:20 +00:00
67a9e14607 Disable tempo-query by default in Jsonnet libs (#2462)
tempo-query was required for Grafana version <7.5 for compatibility with jaeger-ui. grafana version <7.5 didn't have Tempo datasource, and we used jaeger datasource to query tempo via tempo-query.

Grafana 7.5 was released on Mar 25, 2021, which was 2+ year ago.

It is already disabled by default in helm-chart: grafana/helm-charts#2387
2023-05-30 15:37:43 +05:30
282775b007 [jsonnet-microservices] Update memcached image (#2466)
* [jsonnet-microservices] Update memcached image

* Update changelog

* Update changelog
2023-05-16 15:13:54 +00:00
f69474f242 Add support for IPv6 (#1555)
* Add support for IPv6 ring address detection

* Update dskit dependency after inet6 merge

Signed-off-by: Zach Leslie <zach.leslie@grafana.com>

* Add withInet6 jsonnet method to microservices

Signed-off-by: Zach Leslie <zach.leslie@grafana.com>

* Update serverless vendor

Signed-off-by: Zach Leslie <zach.leslie@grafana.com>

* Update jsonnet overrides for ipv6 testing

Signed-off-by: Zach Leslie <zach.leslie@grafana.com>

* Drop unnecessary address spec

Signed-off-by: Zach Leslie <zach.leslie@grafana.com>

* Include configuration docs

Signed-off-by: Zach Leslie <zach.leslie@grafana.com>

* Update changelog

---------

Signed-off-by: Zach Leslie <zach.leslie@grafana.com>
2023-05-02 17:42:41 +00:00
488f5778ea Prefix metric namespaces with tempo instead of cortex (#2204)
* Prefix metric namespaces with tempo instead of cortex

* Remove cortex_ prefixed metrics

* Mark breaking change
2023-04-11 10:12:55 +02:00
0c9a822cbe tempo-mixin: disable auto refresh every 10 seconds (#2290)
* tempo-mixin: disable auto refresh every 10 seconds

* Update CHANGELOG.md
2023-03-31 18:51:00 +00:00
eb4fa109b2 Update tempo-mixin to show request in Resources dashboard (#2281)
* Add tag and link to Tenants dashboard

* Add CPU and Memory request in Resources Dashboard

* compile tempo-mixin

* Update CHANGELOG.md

* make jsonnet happy in CI
2023-03-31 12:32:53 +02:00
dbcb2a8198 [jsonnet-microservices] support k8s v1.25 (#2230)
* Reduce use of kausal to allow for api updates

Signed-off-by: Zach Leslie <zach.leslie@grafana.com>

* [jsonnet-microservices] Update PodDisruptionBudget to policy/v1

Signed-off-by: Zach Leslie <zach.leslie@grafana.com>

* Update minimum k8s version for jsonnet installation to 1.21

Signed-off-by: Zach Leslie <zach.leslie@grafana.com>

---------

Signed-off-by: Zach Leslie <zach.leslie@grafana.com>
2023-03-27 20:27:18 +00:00
f9000562d5 Apply rate() to bytes/s panel in tenant's dashboard (#2081) 2023-02-07 14:28:12 +00:00
623e7f20c3 Remove feature flags and default to on (#2004)
* remove search_enabled

Signed-off-by: Joe Elliott <number101010@gmail.com>

* remove metrics_enabled

Signed-off-by: Joe Elliott <number101010@gmail.com>

* example cleanup

Signed-off-by: Joe Elliott <number101010@gmail.com>

* fix handler setup

Signed-off-by: Joe Elliott <number101010@gmail.com>

* cleanup

Signed-off-by: Joe Elliott <number101010@gmail.com>

* single binary jsonnet cleanup

Signed-off-by: Joe Elliott <number101010@gmail.com>

* additional cleanup

Signed-off-by: Joe Elliott <number101010@gmail.com>

* make jsonnet

Signed-off-by: Joe Elliott <number101010@gmail.com>

* changelog

Signed-off-by: Joe Elliott <number101010@gmail.com>

* additional cleanup

Signed-off-by: Joe Elliott <number101010@gmail.com>

* removed wait for no longer existing features

Signed-off-by: Joe Elliott <number101010@gmail.com>

* return err unconfigured and let modules.go decide what to do

Signed-off-by: Joe Elliott <number101010@gmail.com>

Signed-off-by: Joe Elliott <number101010@gmail.com>
2023-01-19 14:04:14 -05:00
1ff92649a0 Fix last few alerts with per_cluster_label (#2000)
* Fix last few alerts with per_cluster_label

Signed-off-by: Whyeasy <Whyeasy@users.noreply.github.com>

* Add entry to CHANGELOG.md

Signed-off-by: Whyeasy <Whyeasy@users.noreply.github.com>

Signed-off-by: Whyeasy <Whyeasy@users.noreply.github.com>
2023-01-17 08:59:09 +01:00
1c2e522de9 Tempo 2.0: Config Cleanup (#1978)
* frontend

Signed-off-by: Joe Elliott <number101010@gmail.com>

* querier

Signed-off-by: Joe Elliott <number101010@gmail.com>

* compactor

Signed-off-by: Joe Elliott <number101010@gmail.com>

* ingester

Signed-off-by: Joe Elliott <number101010@gmail.com>

* storage v1

Signed-off-by: Joe Elliott <number101010@gmail.com>

* storage v2

Signed-off-by: Joe Elliott <number101010@gmail.com>

* Always use getOrCreateInstance

Signed-off-by: Joe Elliott <number101010@gmail.com>

* manifest

Signed-off-by: Joe Elliott <number101010@gmail.com>

* changelog

Signed-off-by: Joe Elliott <number101010@gmail.com>

* docs

Signed-off-by: Joe Elliott <number101010@gmail.com>

* lint

Signed-off-by: Joe Elliott <number101010@gmail.com>

* lengthen fake polling cycle to prevent test failures

Signed-off-by: Joe Elliott <number101010@gmail.com>

* Restore wal version and default to block version

Signed-off-by: Joe Elliott <number101010@gmail.com>

* Warn when v2 settings drift and v2 is not set

Signed-off-by: Joe Elliott <number101010@gmail.com>

* update examples

Signed-off-by: Joe Elliott <number101010@gmail.com>

* remove todo

Signed-off-by: Joe Elliott <number101010@gmail.com>

* recompiled jsonnet

Signed-off-by: Joe Elliott <number101010@gmail.com>

Signed-off-by: Joe Elliott <number101010@gmail.com>
2023-01-11 14:08:34 -05:00
e96d482430 Zone aware ingesters (#1936)
* add zone aware replication for ingesters

* jsonnet fmt

* add check for multizone_zone_ingester_enabled in example

* add readme file for jsonnet
2022-12-15 15:03:40 -05:00
9fd78c381f tempo-mixin: tweak dashboards to support metrics without cluster label present (#1913)
* tempo-mixin: tweak dashboards to support metrics without cluster label present

* Update CHANGELOG.md
2022-11-28 14:32:26 +01:00
b35a0a6d3d tempo-mixin: don't pull in entire jsonnet-libs repository as dependency (#1909) 2022-11-23 14:30:41 +00:00
73fd4991b9 Add tenant's dashboard (#1901)
* Add tenant's dashboard

* Regenerate dashboard

* Fixes

* Add entrylog changelog
2022-11-22 15:09:36 +01:00