Loki is a horizontally-scalable, highly-available, multi-tenant, log aggregation system inspired by Prometheus. It is designed to be very cost effective, as it does not index the contents of the logs, but rather a set of labels for each log stream. Plain text logging with labels.
Almost all existing log aggregation solutions involve using full-text search systems to index logs (usually structured logs); at first glance this has some obvious advantages, with a rich and powerful feature set allowing for complex queries.
However, these existing solutions also have some disadvantages:
- they are complicated to scale, resource intensive and difficult to operate.
- The challenges and operational overheads of maintaining these systems have led to a migration to SaaS log aggregation. These SaaS systems have turned out to be excessively expensive. This leads to engineers logging less to cut costs and being forced to leave some of their systems with no log coverage.
- An increasingly common pattern is the use of time series monitoring in combination with log aggregation. For incident responses, the initial alerting and querying is done with time series metrics and the majority of log queries just focus on a time range and some simple parameters (host, service etc). Therefore most of the advanced search capabilities that cost so much are rarely used.
Loki takes a different design tradeoff; instead of indexing the log data in log streams, Loki indexes the metadata for that log stream (server name, service name). This metadata is formatted as Prometheus-style multi-dimensional labels.
This has some major advantages:
- Loki ingestion uses the same service discovery and label relabelling libraries as Prometheus, meaning time series/metrics in Prometheus and Loki log streams can have the same labels. Loki minimises the cost of the context switching between logs and metrics and this will help reduce incident response times and improve the user experience.
- By storing plain text logs and only indexing the metadata, storage is hugely simplified. This makes Loki simpler to operate and provides some significant cost savings. This means engineers can store all of their logs.
- Prometheus labelling goes hand-in-hand with Kubernetes label selection. This makes Loki a particularly good fit for storing Kubernetes logs.
Loki consists of 3 components:
lokiis the main server, responsible for storing logs and processing queries.promtailis the agent, responsible for gathering logs and sending them toloki.- Grafana for the UI.
Promtail is a daemon that discovers targets, produces metadata labels and tails log files to produce streams of logs and uses the same service discovery and label relabelling libraries as Prometheus.
Grafana 6.0 will have native support for Loki (it is already included in the latest master). The new Explore UI has custom support for querying log streams from Loki. Read about the Explore UI in Grafana and its Prometheus and Loki support here.