Complex architectures, pressures to deploy faster, and demand for optimal performance have placed greater strain on monitoring teams and as a result, an increasing number are looking to implement more advanced monitoring techniques. Part of the initial challenge around this is understanding what advanced monitoring techniques actually are. In this article, I help clarify this by differentiating basic and advanced monitoring, with examples on how each would be applied to Postgres monitoring. I’ll also share a few of the rewards you’ll gain when moving up from basic to advanced monitoring.
Advanced Monitoring vs. Basic Monitoring
Basic monitoring is red light-green light monitoring: is your system on or off? Is your system malfunctioning? Basic monitoring is no less important than advanced monitoring, but it involves addressing questions that are far easier to answer. For example, knowing if your system is malfunctioning is an easier question to answer than more advanced monitoring questions, like whether you’re meeting quality of service guarantees. Basic monitoring should prioritize the systems and services that are as close to the place where value is actually generated, which is usually the end user.
Advanced monitoring techniques are required when you’ve already applied basic monitoring and haven’t yet achieved a level of understanding of how your systems behave in a way that’s sufficient to drive your organization.
As opposed to red light-green light basic monitoring, advanced monitoring involves observing a system to understand if it is behaving how it should. This is an easy question to ask, but a hard one to answer. In fact, answering this is often a very long process of continually forming good hypotheses around how you believe the system should be behaving, and determining the best way to observe the system in order to understand how it is actually behaving. Once you get to this point, advanced monitoring involves automating the observation and the diagnosis of deviation from expectations.
Advanced Monitoring vs. Basic Monitoring: Postgres
To better illustrate what advanced monitoring techniques involve and how they’re different from basic monitoring, let’s use Postgres as an example.
Connected vs. quality of service
If you’re monitoring Postgres using basic monitoring techniques, you’re monitoring things like:
- Is Postgres running?
- Is the system out of space?
- Can the system service queries?
- Does a simple query return expected results?
These are easy on/off, healthy/unhealthy, true/false questions. When you start applying advanced monitoring techniques, things are less straightforward and more “fuzzy.” You’re now asking how you expect it to behave. For example, the job of Postgres is to accept connections and answer questions for the application. With advanced monitoring, you have the ability to observe that, monitoring things like:
- What are all the queries that are running or that the application asks?
- What is the latency profile of the queries and has it changed?
- When do we expect to run out of space?
- How saturated is the service?
The first part of advanced monitoring is being able to observe what’s going on and compare that to what you expect should be happening. You may, for instance, expect the workloads running on Postgres to look a certain way, and your monitoring system tells you the workloads don’t in fact look like what you expected. This isn’t necessarily bad — it just means something may be off that requires your attention. If you’re applying basic monitoring only and simply checked that the workload connection was on/off, then that likely would not be enough information to diagnose the issue. The basic monitoring applied to Postgres is about correct functionality, while advanced monitoring is about quality of service, performance, and behavior.
Synthetic work vs. real work
Basic monitoring tends to be synthetic, which means that you are internally doing work to test some portion of the system, such as how much disk space is available. However, relying solely on synthetic work does not come close to providing real, accurate insights.
Advanced monitoring sometimes includes synthetic work, but it tends to be much more real observed behavior — meaning you’re actually watching real things happen in that system and measuring them. For instance, when a user logs into an app, multiple queries may be run. To ensure performance requirements are met, engineers using advanced monitoring will observe measurements such as:
- How long did each query take?
- What was the size of the result set?
- Did any of the queries have errors?
While there’s value in synthetic work, most advanced monitoring relies on the observation of real work to guide analysis and decision-making.
Momentary observation vs. historical analysis
While knowing if you currently have space left on Postgres may be helpful, it’s much more valuable to have historical recorded measurements of space that you can model from. This gives you insights such as how quickly space goes away and how quickly it grows. Consider concurrent connections into Postgres, or the number of connected users. This is usually limited to a relatively small number, and one of the classic problems is knowing if you have enough connections. With basic monitoring, the answer to this is unfortunately black and white: you have enough connections until you’re down and nothing works. But when you apply advanced monitoring, you can record how many concurrent users you have every couple of seconds over time, allowing you to gain a better understanding of how close you are to running out of space and to adjust as needed.
This historical data allows you to identify trends and understand what the behavior of the system is, so that you can accurately plan capacity:
- Does your number of connected users ever get close to the limit?
- Are your number of connected users growing steadily?
If two weeks ago you had 100 users, last week 110, and this week 120, then it’s likely that next week will be bigger. Historical data allows you to understand these trends, so you can accurately plan future capacity.
The Rewards of Applying Advanced Monitoring
Gain surprising insights about your systems
Some of the biggest advantages of applying advanced monitoring come not from actually having advanced monitoring, but rather the journey that got you to advanced monitoring. Engineers have expectations about how a system should work and it’s often not accurate. It’s not that the expectations are bad; rather, you may have assumed someone did something in the system or something was connected a certain way, and it turns out that was not the case. The exercise of reconciling your expectations with a set of measurements is usually very eye opening. You learn something about your system or you learn something about its brokenness, and both of these learnings are valuable. In fact, learning something new about your system is often more valuable than finding a broken part.
Gain efficiencies and remediate faster
Applying advanced monitoring techniques often helps you diagnose inefficiencies in systems; whether they’re inefficiencies that present themselves to users as slow page loads, or perhaps they’re inefficiencies around logistics (as one example). Whatever it is, you’re going to gain deeper insights into your systems, learn things about your systems that challenge your expectations, and notice issues you never have before. You’ll also be much more prepared to remediate issues quickly — or even prevent them from happening in the first place.
Gain confidence in understanding your systems
The more complex systems get, the harder it is to solve problems. Today’s IT environments are so complex and have so many moving parts that when a system malfunctions, the failure can manifest somewhere else. Advanced monitoring gives engineers confidence in understanding the complex systems that they operate. This is incredibly useful in troubleshooting, capacity planning, and failure anticipation. If you have a deep understanding of how your systems behave, then you can predict where or when the next failure will be.
If you don’t have good monitoring in place, then you spend all of your time firefighting, which ultimately burns out your employees and it takes away from the product roadmap. Advanced monitoring will not remove all of the firefighting, but your issues will be less of an emergency and you’ll have more confidence when you fix them.
Monitoring is a Journey
Many organizations realize the benefits they can gain if they implement more advanced monitoring techniques, but many don’t change their processes and tools because it seems overwhelming. The key is to remember that it doesn’t have to happen overnight. Moving to more advanced monitoring is a journey; and you can set the pace. As you begin that journey, you’ll immediately begin to realize some of those expected benefits — and gain unexpected ones as well.