Cassandra Query Observability with Libpcap and Protocol Observer

Opinions vary in recent online discussions regarding systems and software observability. Some state that observability is a replacement for monitoring. Others that they are parallel mechanisms, or that one is a subset of another (not to mention where tracing fits into such a hierarchy). Monitoring Weekly recently provided a helpful list of resources for an overview of this discussion, as well as some practical applications of observability. Without attempting to put forth yet another opinion, let’s take a step back and ask, what is observability?

What is Observability

Software observability wasn’t invented yesterday, and has a history going back at least 10 years, to the birth of DTrace. One can go back even further to the Apollo 11 source code to see some of the earliest implementations of software observability, and also systems monitoring from mission control. If we ask Wikipedia what Observability is, we get an answer along the lines of how well internal states of a system can be inferred from knowledge of its external outputs.

To understand this distinction between monitoring and observability, consider what a doctor does with a stethoscope. Anyone who has watched one of the many TV medical dramas, or been in a doctor’s office themselves, should be familiar with this. One uses the external outputs of the stethoscope to determine the internal state of the patient. The patient is monitored with instruments such as the stethoscope, and it is the observable traits of these instruments that allow this monitoring to occur. The doctor observes the instruments in order to monitor the patient, because this is often more effective than observing the patient directly, which might not even be possible for some of a patient’s internal states.

Observing Wirelatency and Monitoring Cassandra

So now that we have a baseline understanding of these terms, let’s dive into a practical use case that we’ve implemented here at Circonus. Our patient will be the Apache Cassandra wide column distributed data store. Our stethoscope will be the wirelatency tool, which uses the libpcap library to grab a copy of packets off of the wire before they are processed by the operating system. We are going to determine how well the internal states of Cassandra are by observing the external outputs (the query latency data).

Let’s take a quick overview of libpcap. Pcap allows one to get a copy of packets off the ethernet interface at the link layer prior to their being handled by the kernel networking code. The details of the implementation vary between operating systems, but packets essentially bypass the kernel network stack and are made available to user space. Linux uses PF_PACKET sockets to accomplish this. These are often referred to as raw sockets. BSD based systems use the Berkeley Packet Filter. For more detail, refer to the publication “Introduction to RAW-sockets” (Heuschkel, Hofmann, Hollstein, & Kuepper, 2017).

So now that we can grab packets off the wire in a computationally performant means, we can reassemble bidirectional TCP streams, decode application specific protocols, and collect telemetry. Enter the wirelatency utility that was developed in Go. The gopacket library handles the reassembly of bidirectional data from the packets provided by pcap. Wirelatency provides a TCPProtocolInterpreter interface, which allows us to define protocol specific functions in modules for HTTP, PostgreSQL, Cassandra CQL, and Kafka application service calls. The circonus-gometrics library allows us to send that telemetry upstream to Circonus for analysis.

It’s time to do some observability. First lets use `tcpdump` to take a look some actual Cassandra CQL network traffic. I used Google’s Compute Engine to setup a Cassandra 3.4 server and client node. Initially, I used an Ubuntu 17.04 host and installed Cassandra 3.11, but the server failed to startup and threw an exception. Fortunately, this was a documented issue. I could have downgraded JDK or recompiled Cassandra from source (and probably would have done so 10 years ago), but I decided to take the easy route and lit up two new hosts running RHEL. It was a cake walk comparatively to get Cassandra up and running, so I used the excellent DataStax documentation to get a simple schema up and insert some data. At this point, I was able to grab some network traffic.

fredmoyer@instance-1 wirelatency]$ sudo tcpdump -AvvvX dst port 9042 or src port 9042
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
06:55:32.586018 IP (tos 0x0, ttl 64, id 60628, offset 0, flags [DF], proto TCP (6), length 109)
instance-2.c.deft-reflection-188505.internal.38428 > instance-1.c.deft-reflection-188505.internal.9042: Flags [
P.], cksum 0xacad (correct), seq 2362125213:2362125270, ack 3947549198, win 325, options [nop,nop,TS val 435889631
ecr 435892523], length 57
0x0000: 4500 006d ecd4 4000 4006 3896 0a8e 0003 E..m..@[email protected].....
0x0010: 0a8e 0002 961c 2352 8ccb 2b9d eb4a d20e ......#R..+..J..
0x0020: 8018 0145 acad 0000 0101 080a 19fb 25df ...E..........%.
0x0030: 19fb 312b 0400 000b 0700 0000 3000 0000 ..1+........0...
0x0040: 1b73 656c 6563 7420 2a20 6672 6f6d 2063 .select.*.from.c
0x0050: 7963 6c69 7374 5f6e 616d 653b 0001 3400 yclist_name;..4.
0x0060: 0000 6400 0800 0565 2698 f00e 08 ..d....e&....
06:55:32.593339 IP (tos 0x0, ttl 64, id 43685, offset 0, flags [DF], proto TCP (6), length 170)
instance-1.c.deft-reflection-188505.internal.9042 > instance-2.c.deft-reflection-188505.internal.38428: Flags [
P.], cksum 0x15bd (incorrect -> 0x1e61), seq 1:119, ack 57, win 220, options [nop,nop,TS val 435912925 ecr 43588963
1], length 118
0x0000: 4500 00aa aaa5 4000 4006 7a88 0a8e 0002 E.....@[email protected].....
0x0010: 0a8e 0003 2352 961c eb4a d20e 8ccb 2bd6 ....#R...J....+.
0x0020: 8018 00dc 15bd 0000 0101 080a 19fb 80dd ................
0x0030: 19fb 25df 8400 000b 0800 0000 6d00 0000 ..%.........m...
0x0040: 0200 0000 0100 0000 0300 0763 7963 6c69 ...........cycli
0x0050: 6e67 000c 6379 636c 6973 745f 6e61 6d65 ng..cyclist_name
0x0060: 0002 6964 000c 0009 6669 7273 746e 616d ..id....firstnam
0x0070: 6500 0d00 086c 6173 746e 616d 6500 0d00 e....lastname...
0x0080: 0000 0100 0000 105b 6962 dd3f 904c 938f .......[ib.?.L..
0x0090: 61ea bfa4 a803 e200 0000 084d 6172 6961 a..........Maria
0x00a0: 6e6e 6500 0000 0356 4f53 nne....VOS
06:55:32.593862 IP (tos 0x0, ttl 64, id 60629, offset 0, flags [DF], proto TCP (6), length 52)
instance-2.c.deft-reflection-188505.internal.38428 > instance-1.c.deft-reflection-188505.internal.9042: Flags [
.], cksum 0x55bd (correct), seq 57, ack 119, win 325, options [nop,nop,TS val 435889639 ecr 435912925], length 0
0x0000: 4500 0034 ecd5 4000 4006 38ce 0a8e 0003 E..4..@[email protected].....
0x0010: 0a8e 0002 961c 2352 8ccb 2bd6 eb4a d284 ......#R..+..J..
0x0020: 8010 0145 55bd 0000 0101 080a 19fb 25e7 ...EU.........%.
0x0030: 19fb 80dd

Here we can observe the query issued and the result sent back from the server. And because the packets are timestamped, we can calculate the query latency, which was about 7.3 milliseconds (06:55:32.593339 – 06:55:32.586018 = 0.0073209).

Here’s the approach using `protocol-observer`, the wirelatency executable. The golang binary takes an API token, a wire protocol (in this case `cassandra_cql`), and a number of optional debugging arguments. We can see below how it tracks inbound and outbound TCP streams. It reassembles those streams, pulls out the Cassandra queries, and records the query latencies. At regular intervals, it executes an HTTP PUT the Circonus API endpoint to store the observed metrics. The metrics are recorded as log linear histograms, which means that we can store thousands of query latencies (or much more) in a very compact data structure without losing any accuracy from a statistical analysis standpoint

[fredmoyer@instance-1 wirelatency]$ sudo API_TOKEIN:xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx -wire c
assandra_cql -debug_capture=true -debug_circonus=true -debug_capture_data=true -debug_cql=true -debug_measurements
=true

2018/02/14 07:25:07 [DEBUG] New(10.142.0.3->10.142.0.2, 38428->9042) -> true
2018/02/14 07:25:07 [DEBUG] establishing sessions for net:10.142.0.3->10.142.0.2
2018/02/14 07:25:07 [DEBUG] establishing dsessions for ports:38428->9042
2018/02/14 07:25:07 [DEBUG] new inbound TCP stream 10.142.0.3->10.142.0.2:38428->9042 started, paired: false
2018/02/14 07:25:07 [DEBUG] New(10.142.0.2->10.142.0.3, 9042->38428) -> false
2018/02/14 07:25:07 [DEBUG] new outbound TCP stream 10.142.0.2->10.142.0.3:9042->38428 started, paired: true
2018/02/14 07:25:08 [DEBUG] flushing all streams that haven't seen packets, pcap stats: &{PacketsReceived:3 Packets
Dropped:0 PacketsIfDropped:0}
2018/02/14 07:25:08 [DEBUG] Packaging metrics
2018/02/14 07:25:08 [DEBUG] PUT URL:xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
2018/02/14 07:25:09 [DEBUG] 2 stats sent

Check out the Wirelatency repo or ask about it on the Circonus-labs slack to learn more about how it works.

Conclusion

We can get a look at the distribution of latencies observed by using a histogram to display that distribution.

This histogram shows that a lot of requests are clustered between 2 and 4 milliseconds. We can also see a much smaller mode between 25 and 30 milliseconds. This tells us that we likely have two different data access patterns going on for this example select query. It’s possible that the first mode indicates queries that returned data from a memory cache, and the second from disk. This is something that can’t be discerned just from this data collection, but we can go one step further and plot an overlay with block I/O activity via eBPF to see if there is a correlation between the higher latency query times and disk activity.

If we had just looked at average query latency with a more blunt instrument, we would have probably concluded that most queries were taking around 10 milliseconds. But looking at the distribution here, we can see that very few queries actually took that long to execute. Your ability to observe your system accurately depends on the correctness of the instruments you are using. Here we can see how libpcap and the histogram visualization give us a detailed view into how our Cassandra instance is really performing.

Observability into key Service Level Objectives is essential for the success of your enterprise, and observability is the first step for successful analysis. Only once a doctor collects their observations of the symptoms can they can begin to make their diagnosis. What kind of insights can we gain into the behavior of our systems (of the patient’s internal states) from these observations?

That’s just part of what we will explore next week, when we’ll talk more about Service Level Objectives and dive into SLO analysis in more detail.