Sparse High Resolution Histograms
Co-authored the sparse high resolution histograms in Prometheus with Björn Rabenstein. This work touches almost all of Prometheus codebase - scraping, TSDB, PromQL, recording and alerting rules, etc.
I have given three conference talks on this so far:
- Sparse High-resolution Histograms in the Prometheus TSDB (when we started building it in TSDB)
- Prometheus Sparse High-Resolution Histograms in Action (when we had a working prototype)
- Native Histograms in Prometheus (when it was ready and released live in this talk)
Out-of-order Ingestion Support
(timstamp, value) is considered out-of-order if the timestamp is older than the latest timestamp received for that particular time series. Traditionally, the Prometheus TSDB only accepted in-order samples that are less than one hour old, discarding everything else.
Prometheus Alert-Generator Compliance Specification and Test Suite
Alert-generator is the component responsible for generating reliable and consistent alerts. I authored the 1.0 specification for the Prometheus Alert-Generator compliance that you can find here. I also authored the test suite to automatically test this specification for any software. You can find it here.
Snapshot of In-Memory Chunks on Shutdown for Faster Restarts
This work takes a snapshot of in-memory data from TSDB while shutting down to speed-up the restart by skipping the WAL replay. This brought down the restart time of Prometheus by up to 80%!
Memory-Mapping of Head Chunks from Disk
Instead of storing all the compressed samples (called chunks) in the memory, this work flushes them to the disk and memory-maps them from disk. This brought down the memory usage of Promtheus by upto 50%. This was achieved with the combination of PR#6830 and PR#6679.
@ Modifier in PromQL
Learn more in this blog post.
Subquery Support in PromQL
With PR#4831, I introduced subqueries of the form
<instant_query> '[' <range> ':' [ <resolution> ] ']' [ offset <duration> ]
in Prometheus. It lets you run a PromQL query inside a PromQL query. You can read more about it in this blog post.
Vertical Compaction and Queries in TSDB
With PR#370, I added support in Prometheus to handle time-overlapping blocks of data. This enabled backfilling of old data into Prometheus.
for State of Alerts Across Restarts
Unit Testing of Rules in promtool
Performance and Memory Optimizations
It's more about the investigation than the final fix.
Noticeable reduction in memmory allocations by re-using the chunk iterators https://github.com/prometheus-junkyard/tsdb/pull/642
Efficient iteration during hashing. Go language specific optimization. https://github.com/prometheus/prometheus/pull/5707