Skip to main content

Prometheus

image

Project website: http://prometheus.io/ GitHub: https://github.com/prometheus


Sparse High Resolution Histograms

Co-authored the sparse high resolution histograms in Prometheus with Björn Rabenstein. This work touches almost all of Prometheus codebase - scraping, TSDB, PromQL, recording and alerting rules, etc.

An entire year's work was opened as a single GitHub PR here. Find the detailed design doc by Björn here, and his another document on PromQL extensions here.

I have given three conference talks on this so far:


Out-of-order Ingestion Support

A sample (timstamp, value) is considered out-of-order if the timestamp is older than the latest timestamp received for that particular time series. Traditionally, the Prometheus TSDB only accepted in-order samples that are less than one hour old, discarding everything else.

I, along with Jesús Vázquez and Dieter Plaetinck, added support to Prometheus TSDB to accept out-of-order without any limits.

Design Doc, Code, Blog Post, Conference talk


Prometheus Alert-Generator Compliance Specification and Test Suite

Alert-generator is the component responsible for generating reliable and consistent alerts. I authored the 1.0 specification for the Prometheus Alert-Generator compliance that you can find here. I also authored the test suite to automatically test this specification for any software. You can find it here.


Snapshot of In-Memory Chunks on Shutdown for Faster Restarts

This work takes a snapshot of in-memory data from TSDB while shutting down to speed-up the restart by skipping the WAL replay. This brought down the restart time of Prometheus by up to 80%!

This was added in PR#7229. You can find detailed explanation in this blog post. I also talked about it in this conference talk.


Memory-Mapping of Head Chunks from Disk

Instead of storing all the compressed samples (called chunks) in the memory, this work flushes them to the disk and memory-maps them from disk. This brought down the memory usage of Promtheus by upto 50%. This was achieved with the combination of PR#6830 and PR#6679.

You can read more about it in this blog post, and a detailed explanation in this blog post. I also talked about it in this conference talk.


@ Modifier in PromQL

Based on this design doc, I added the @ modifier for PromQL in PR#8121. It let's you pin any part of PromQL to any timestamp that you want, irrespective of the query time.

Learn more in this blog post.


Subquery Support in PromQL

With PR#4831, I introduced subqueries of the form

<instant_query> '[' <range> ':' [ <resolution> ] ']' [ offset <duration> ]

in Prometheus. It lets you run a PromQL query inside a PromQL query. You can read more about it in this blog post.


Vertical Compaction and Queries in TSDB

With PR#370, I added support in Prometheus to handle time-overlapping blocks of data. This enabled backfilling of old data into Prometheus.


Persist for State of Alerts Across Restarts

This was part of my GSoC 2018 work. PR#4061 added support to carry forward the state of alerts across restarts. Read more about it in my blog post.


Unit Testing of Rules in promtool

This was part of my GSoC 2018 work. PR#4350 added unit testing of rules in the promtool. Read more about it in my blog post.


Performance and Memory Optimizations

It's more about the investigation than the final fix.