Prometheus
Project website: http://prometheus.io/ GitHub: https://github.com/prometheus
Sparse High Resolution Histograms
Co-authored the sparse high resolution histograms in Prometheus with Björn Rabenstein. This work touches almost all of Prometheus codebase - scraping, TSDB, PromQL, recording and alerting rules, etc.
An entire year's work was opened as a single GitHub PR here. Find the detailed design doc by Björn here, and his another document on PromQL extensions here.
I have given three conference talks on this so far:
- Sparse High-resolution Histograms in the Prometheus TSDB (when we started building it in TSDB)
- Prometheus Sparse High-Resolution Histograms in Action (when we had a working prototype)
- Native Histograms in Prometheus (when it was ready and released live in this talk)
Out-of-order Ingestion Support
A sample (timstamp, value)
is considered out-of-order if the timestamp is older than the latest timestamp received for that particular time series. Traditionally, the Prometheus TSDB only accepted in-order samples that are less than one hour old, discarding everything else.
I, along with Jesús Vázquez and Dieter Plaetinck, added support to Prometheus TSDB to accept out-of-order without any limits.
Design Doc, Code, Blog Post, Conference talk
Prometheus Alert-Generator Compliance Specification and Test Suite
Alert-generator is the component responsible for generating reliable and consistent alerts. I authored the 1.0 specification for the Prometheus Alert-Generator compliance that you can find here. I also authored the test suite to automatically test this specification for any software. You can find it here.
Snapshot of In-Memory Chunks on Shutdown for Faster Restarts
This work takes a snapshot of in-memory data from TSDB while shutting down to speed-up the restart by skipping the WAL replay. This brought down the restart time of Prometheus by up to 80%!
This was added in PR#7229. You can find detailed explanation in this blog post. I also talked about it in this conference talk.
Memory-Mapping of Head Chunks from Disk
Instead of storing all the compressed samples (called chunks) in the memory, this work flushes them to the disk and memory-maps them from disk. This brought down the memory usage of Promtheus by upto 50%. This was achieved with the combination of PR#6830 and PR#6679.
You can read more about it in this blog post, and a detailed explanation in this blog post. I also talked about it in this conference talk.
@
Modifier in PromQL
Based on this design doc, I added the @
modifier for PromQL in PR#8121. It let's you pin any part of PromQL to any timestamp that you want, irrespective of the query time.
Learn more in this blog post.
Subquery Support in PromQL
With PR#4831, I introduced subqueries of the form
<instant_query> '[' <range> ':' [ <resolution> ] ']' [ offset <duration> ]
in Prometheus. It lets you run a PromQL query inside a PromQL query. You can read more about it in this blog post.
Vertical Compaction and Queries in TSDB
With PR#370, I added support in Prometheus to handle time-overlapping blocks of data. This enabled backfilling of old data into Prometheus.
Persist for
State of Alerts Across Restarts
This was part of my GSoC 2018 work. PR#4061 added support to carry forward the state of alerts across restarts. Read more about it in my blog post.
Unit Testing of Rules in promtool
This was part of my GSoC 2018 work. PR#4350 added unit testing of rules in the promtool. Read more about it in my blog post.
Performance and Memory Optimizations
It's more about the investigation than the final fix.
Noticeable reduction in memmory allocations by re-using the chunk iterators https://github.com/prometheus-junkyard/tsdb/pull/642
Efficient iteration during hashing. Go language specific optimization. https://github.com/prometheus/prometheus/pull/5707
Series of memory allocation optimizations for compaction. Original PR: https://github.com/prometheus-junkyard/tsdb/pull/627, which was broken down into PR#643 PR#644 PR#645 PR#653 PR#654.