We're happy to share another open-source project we've developed as part of day-to-day work with the Elastic Stack.

Elastic Query Tracker is a lightweight proxy server for Elasticsearch that provides insight into what searches are executed, how long search run, the source of a search and more.

We developed this proxy for one of our projects for a customer with several different query sources, some were executed by micro-services and other by Kibana, which was heavily used by staff members. The Elasticsearch cluster was running fine, but several performance issues occurred on regular patterns due to the high search workload on the cluster. We already scaled and tuned the cluster and looked into ways to increase performance without increasing hardware costs.

Our approach was to measure what searches are executing with a heavy impact, focusing on these searches and trying to optimize them may lead to the performance increase we were after.

For this purpose, we wrote a simple proxy in NodeJS to intercept all requests, isolate search requests, record the request and the result statistics and then reply back. If the request is not related to search, we simply pass it through to Elasticsearch and reply. We built an accompanying dashboard that highlights the searches' impact on the cluster.

[img]

This allowed to have a high-level overview of what searches are running in our cluster and what potential impact they may have on it. But it wasn't detailed enough, we had several different microservices running queries and wanted to isolate the specific source of the query. For this, we added support for custom headers that can be provided by the caller, when specified, the proxy parses the headers and tags the information with it. So, for example, the authentication microservice used the header:

x-query-tracker-tags: auth,login

We knew that the source was the auth container, and it also specified the `login` tag for further aggregation options. We also added the `x-query-tracker-id` header that allows specifying the query id, this allows for aggregations based on the specific query sent regardless of the origin.

Adding this data to the dashboard results gives us the ability to review our search impact based on the requesting service and the specific query id being used.

[img]

Elastic Query Tracker uses Elasticsearch itself for storing the results, but can also ship the data to Kafka or Redis when queues are needed to balance the workload on the cluster or outputing the data to another destination.