Deriving meaningful insights from third-party logs has always been a difficult yet necessary task. Most analysis occurs after-the-fact, when something has gone wrong. Very few tools allow real-time monitoring of logs, so SREs have become accustomed to backfilling log data into various analysis tools. Postmortem log analysis is the de facto standard, yet should it be? Why shouldn’t you be able to monitor your server logs in real-time?

As an operator of large-scale systems, we’ve often felt the pain of postmortem analysis. These experiences have led us to develop circonus-logwatch.

Flexible configuration allows you to easily pinpoint the values you want to extract and transform into metrics, changing the nature of log analysis from “we’ll do better next time” to “let’s look at what’s happening right now.” And best of all, metrics extracted with circonus-logwatch are first-class citizens in Circonus, so you can analyze and alert on them just like you would any other metric.

circonus-logwatch allows you to:

  • Track precise values from third-party logs as first-class Circonus metrics
  • Perform analytics and predictions on those values
  • Set rules and be alerted on those values
  • Store your new log-sourced metrics using the same multi-year data retention policies as the rest of your collected data

Setting Up circonus-logwatch

circonus-logwatch can be installed on most Unix-based servers and is designed to be configurable, making it easy to fit into your existing stack. It supports multiple configuration file formats, and can submit its metrics via StatsD, through a local Circonus Agent, or directly to a Circonus broker.

After installing circonus-logwatch, setup is simple: first add a main config file to define some basic settings and to specify how the metrics will be submitted (StatsD, agent, or broker). Then set up a config file for each third-party log source you want to monitor. You can monitor as many different third-party logs as you wish; simply create a new config file for each log source and place it in the log config directory. Each of these logs will be tailed by circonus-logwatch, which will process the lines as soon as the logging system writes them to the file. When a new line is written to a watched file, circonus-logwatch will take the regular expressions from the config and apply them to the new line, extracting the metric(s) from it and formatting them for submission to Circonus. If you’re collecting these new metrics via a Circonus Agent, then the new values will be submitted to Circonus every sixty seconds. If you’re collecting via direct broker submission, they will be sent every ten seconds, and if you’re collecting via StatsD they will be sent immediately. (Please see the repository for full installation instructions.)

circonus-logwatch comes with many sample log source configs, such as:

  • audit.yaml
    This example allows you to monitor system login and logout events; for instance, you could monitor successful logins compared to login attempts.
  • messages.json
    This example tracks transmit timeouts on ethernet interfaces, so you can monitor potential networking problems from that system’s perspective.
  • ssh.json
    This example monitors when users make SSH connections to that system and when they run sudo commands (for security purposes).
  • system.yaml
    This is a contrived example, tracking macOS system events; it is part of the auth system, when an app uses the user’s credentials to “unlock the user” (potentially accessing/modifying something that maybe the app shouldn’t).
  • yum.yaml
    This example allows you to track Yum package activity on the system…when packages are installed, updated, or erased.

Example Use Case: Monitoring Apache Request Latency

Let’s look at a real-world example: a web-facing system with some number of pages. We want to monitor how long it takes Apache to handle page requests such that we have a good handle on how our web server is performing.

First, we need to get the relevant metrics into Circonus. Apache has the associated information readily available, we just need to expose it by specifying the following LogFormat:

LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" tm:%D" combined

This is a sample of what the resulting logs should look like:

174.223.8.145 - - [18/Sep/2018:18:45:57 +0000] "GET / HTTP/2.0" 200 3266 "-" "Mozilla/5.0 (Linux; Android 7.1.1; Z839 Build/NMF26V; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/68.0.3440.85 Mobile Safari/537.36" tm:87902
104.225.9.155 - - [18/Sep/2018:18:46:00 +0000] "GET / HTTP/1.1" 200 9797 "-" "libmtev/0" tm:12651
97.105.47.162 - - [18/Sep/2018:18:47:47 +0000] "GET / HTTP/2.0" 200 9797 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.92 Safari/537.36" tm:12046

Next, we install circonus-logwatch on a system with access to these logs, and configure it to submit our metrics to Circonus. To extract the relevant metrics from this format, we use the sample Apache log config below:

log_file: /var/log/httpd/access_log
metrics:
# since latency is not included in common log format, it was added to 'combined' format:
# e.g. `LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" tm:%D" combined`

# aggregate latency histogram
- match: 'tm:(?P<value>[0-9.]+)'
name: latency_µs
type: h
# latency histogram by request path
- match: ' (?P<path>/[^ ]*) HTTP.+tm:(?P<value>[0-9.]+)'
name: '{{.path}}`latency_µs'
type: h
# latency histogram by request path and specific request methods
- match: '(?P<method>(GET|POST|PUT)) (?P<path>/[^ ]*) HTTP.+tm:(?P<value>[0-9.]+)'
name: '{{.path}}`{{.method}}`latency_µs'
type: h

This configuration provides us with several different metrics. First, it aggregates the latency across all requests into a single latency_µs metric. Second, it collects a separate metric for each request path, allowing you to track latency for each individual path. Third, it collects a metric for each request method on each path, allowing us to differentiate latency between GET and PUT requests.

Now when we go into Circonus, we have many metrics with names like /my/path`latency_µs.

We continue by picking the metrics corresponding to the requests we want to monitor; for this example let’s say there are twenty. For each request path, we want to know if its latency goes above a half second (500,000 µs) and stays there for some extended period of time. To do so, we create a ruleset for each of those twenty metrics declaring “If the value is present and higher than 500000, trigger a sev 5 alert and wait 10 mins before notifying.” Very straightforward.

We’re not quite done, however. We’re interested in the bigger picture. To comprehensively monitor these requests we will now create a ruleset group. A ruleset group allows us to alert on all requests, as opposed to requests for a specific path. We add all twenty rulesets as members of the group and then add a formula with a threshold of “5 of 20.”

Finally, we can add a contact to the ruleset group, and here’s where the power of circonus-logwatch shines: we would be alerted when 25% of our watched request paths’ latencies ever rise above half a second and stay there for ten minutes. And all this is happening in real-time, based solely on our Apache logs.

circonus-logwatch ruleset group
Our ruleset group would look roughly like this

Real-World Benefits From Log Monitoring

We installed circonus-logwatch on the host serving irondb.io and configured it to watch our request handling latency. Upon doing so, we saw latencies between 250 – 350 ms per request. That’s not horrible, but we expected better. Since the site is run on WordPress, we installed the Super Cache plugin to generate and serve static pages, hoping that would improve our page request latency. As you can see in the histogram graph below, our request handling dropped to a respectable 10 – 20 ms per request after caching was enabled.

WordPress Request Handling Latency, Before and After Super Cache
WordPress Request Handling Latency, Before and After Super Cache

Start Making Better Use of Your Logs

There are many useful metrics hidden in your organization’s logs. These logs contain rich information that can help you make better business decisions, right now, in real-time. Best of all, this capability comes free with any Circonus account.

circonus-logwatch is open source and is available at https://github.com/circonus-labs/circonus-logwatch.

Get blog updates.

Keep up with the latest in telemtry data intelligence and observability.