Video: Math in Big Systems

Every year the esteemed Usenix organization holds their LISA conference. LISA has transformed slowly over the years as systems, architectures, and the nature of large-scale deployments have changed, but this year represented the largest change to date.

“The format of the conference was substantially different and I believe it (changed) for the best. The topics, content, and speakers were both relevant and fantastic while keeping just enough of the UNIX neckbeard vibe to make it familiar.” – Theo Schlossnagle

his year at LISA, our CEO presented what we’ve been doing in the realm of automatic anomaly detection on high-frequency time-series data; an otherwise dry subject was cordially delivered by Theo and very well received.

LISA14: Math in Big Systems by Theo Schlossnagle

Alerting on disk space the right way.

Most people that alert on disk space use an arbitrary threshold, such as “notify me when my disk is 85% full.” Most people then get alerted, spend an hour trying to delete things, and update their rule to “notify me when my disk is 86% full.” Sounds dumb, right? I’ve done it and pretty much everyone I know in operations has done it. The good news is that we didn’t do this because we are all stupid people, we did it because the tools we were using didn’t allow us to ask the questions we really want to answer. Let’s work backwards to a better disk space check.

There are occasionally reasons to set static thresholds, but most of the time we care about disk space it’s because we need to buy more. The question then becomes, “how much advance notice do I need?” Let’s assume, for the sake of argument, that I need 4 weeks to execute on increasing storage capacity (planning for and scheduling possible system downtime, resizing a LUN, etc.). If you’re a cloudy sort of architecture, maybe you’re looking at a single day so that this sort of change happens during a maintenance window where all necessary parties are available. After all, why would you want to act on this in an emergency?

Really, the question we’re aiming at is “when will I run out of disk space in 4 weeks time?” It turns out that this is a very simple statistical question and with a few hints, you can get an answer in short order. First we need a model of the data growth and this is where we need a bit more information. Specifically, how much history should drive the model? This depends heavily on the usage of the system, but most systems have a fairly steady growth pattern and you’d like to include some multiple of the period of that pattern.

Graph Adding an Exponential Regression
Graph Adding an Exponential Regression

To be a little more example oriented, let’s say we have a system that is growing over time and also generates logs that get deleted daily. We expect a general trend upward with daily periodic oscillation as we accumulate log files and then wipe them out. As rule of thumb, I would say that one week of data should be sufficient in most of the systems, so we should build our model off 7 days worth of history.

Graph looking 1 week back and 28 days forward.
Looking 1 week back and 28 days forward

Quite simply, we should take our data over the last 7 days and generate a regression model. Then, we time shift the regression model backwards by 4 weeks (the amount of notice we’d like) and “current value” would be the model-predicted value four weeks from today. If that value is more than 100%, we need to tell someone. Easy.

Suffice it to say some tools require extracting the data into Excel or pulling data out with R or Python to accomplish this. While those tools work well, they fail to fit the bill with respect to monitoring because this model and projected value must be constantly recalculated as new data arrives so that we can reduce the MTTD to something expected.

While Circonus has had this feature squirreled away for many months, I’m pleased to say that the alerting UI has been refactored and it is now accessible to mere mortals (at least those mortals that use Circonus).

A New Day for Navigation and Search

Today we finally rolled out the new navigation menu and search bar that we’ve been working on for a while. We had been getting feedback on the poor usability of the old horizontal menu system and knew that a “pain point” had been reached—it was time to revisit how we treated navigation and search in Circonus.

We heard numerous times from users that our old navigation menus were difficult to use, and a recent survey we performed simply underscored that feedback. The horizontal nature of the menus made them tricky to navigate, especially when combined with the fact that they were not very tall. Also, we had outgrown them; after the recent addition of Metric Clusters and Blueprints, we were feeling cramped and were running out of room in the menu system. The last problem (which we started hearing recently from users) is that the location of the search field made it seem like a global search despite the placeholder hint text it contained. Some users who were new to Circonus hadn’t even noticed the search field; it just blended into the interface too well.

In this redesign we’ve shifted paradigms dramatically to alleviate these three problems. We’ve done away with the notion of showing all the menu all the time, and have implemented a large “sitemap” style menu. When the menu is collapsed, you see the current section name and page title beside a hamburger menu icon. This offers a large trigger area and easy-to-use menu with very few “moving parts.” The menu appears when hovering anywhere over the trigger area, making clicking unnecessary (clicking does work, however, for tablet and other touch-based users). This offers plenty of room both horizontally and vertically for future expansion, and it frees up room to the right for more page-related buttons.

our newly redesigned navigation menu

On pages which are searchable, the search bar now sits immediately beside the menu trigger area (containing the page section and title). This makes it easier for users to recognize the contextual nature of the search, and also increases the visibility of search in general. This new search bar provides a dedicated space to show any current search string in operation on the page, and also offers a “minus” button to clear it with a single click. To enter a search string or edit an existing search string, you can click the magnifying glass or click the existing search string, if present. To commit your search string after typing, simply hit enter on your keyboard.

You’ll also notice that we’ve slightly reorganized the menu structure. The main goal of this was to make things more logical; to provide a better model upon which users can base their own mental models, making it easier to navigate Circonus. As such, the sections have been renamed with verbs pertaining to the general tasks related to each section. First is “Collect,” where you’ll find pages related to collecting and organizing data with checks, metrics, metric clusters, templates, and beacons. Next is “Monitor,” where you’ll go to see your hosts’ statuses, set rules, follow up on alerts, and work with contact groups and maintenance windows. Last is “Visualize.” This is where you work with graphs, worksheets, events, and dashboards. Hopefully this will make it easier for new users to get acquainted with the Circonus workflow of collecting data, setting rules to monitor that data, and working with visualizations.

One last benefit of this new menu design is that we now have the opportunity to highlight some secondary links at the bottom of the menu (documentation links, mobile site and changelog links, as well as keyboard shortcuts help). These have been present in the site footer, but many users are unaware of their existence. We wanted to pull some of these links up into a more prominent position since they’re helpful for users.

Thank you to all of our users whose feedback helps us shape Circonus into a better and more useful tool. We couldn’t do this without you!

Blueprints – Graphing made easy

Introducing Blueprints

Today I’d like to introduce a new Circonus feature we’re calling Blueprints. Blueprints is a way to effortlessly create reusable graphs that can be used to visualize any host where the data you’re collecting is similar.

In the modern age of Internet infrastructure our customers are often faced not with managing just one or two machines, but whole clusters of near identical hosts. Deployed with automation tools like Chef, Puppet or cloud virtual machine imaging systems such as Amazon’s AMI, these all need monitoring and visualizing in a way a powerful tool like Circonus can provide.

Circonus has long supported features such as check templates and a comprehensive API that allows easy configuration for gathering similar data across multiple similar machines. When we came up with the concept of Blueprints, we wanted to bring the same power to visualization of the data we’re storing on these multiple instances, and do it in a way that was simple and intuitive to use. Now that concept is a reality as a powerful new tool for you to use.

Within Circonus any graph can now be quickly turned into a Blueprint with just one click and by entering a catchy name:

All the configuration for the graph is gathered up into the Blueprint. From visual components (for example: colors, line style, and axis assignment) through to the more technical details (the metrics that are being rendered with any formulas, derivatives and mathematical functions that are apply to them) are saved in a blueprint so that they applied to any future graphs you might create.

Creating a new graph from a blueprint is a breeze. One click pops open a dialog that allows you to map the original hosts that were in the original graph to any replacement hosts you’re already collecting similar data:

The selector intelligently offers you only the hosts that make sense for each check. Click, click and you’re done. A new graph for the new host is created in seconds, ready to further customize or share.

Having made the creation of new graphs easy, we wondered if we could do away with it entirely. And we can…with ephemeral visualizations offered on each check:

Clicking the visualize link next to each check now allows you to pick from the blueprints that you can use with this check, and instantly get a popup containing a rendering of the resulting graph. You now can have instant access to the right graph for any check you’re monitoring.

In our own internal use we were taken by surprise at just how powerful creating dynamic visualizations for our hosts are. Blueprints can not only provide us with the most up to date graph for each check on our system, but in times of stress they can be used to create ad-hoc graphs that we can then quickly apply to any of the hosts in our system to see which is misbehaving.

We are constantly working to add powerful new features and functionality to Circonus, like Blueprint, that expand its capabilities and make your job easier.

AWS Cloudwatch Support

This month we pushed native Cloudwatch support – any metric that you have in Cloudwatch can now be added to graphs and dashboards, and alerts can be created for them.

Auto Scaling (AWS/AutoScaling)
AWS Billing (AWS/Billing)
Amazon DynamoDB (AWS/DynamoDB)
Amazon ElastiCache (AWS/ElastiCache)
Amazon Elastic Block Store (AWS/EBS)
Amazon Elastic Compute Cloud (AWS/EC2)
Elastic Load Balancing (AWS/ELB)
Amazon Elastic MapReduce (AWS/ElasticMapReduce)
AWS OpsWorks (AWS/OpsWorks)
Amazon Redshift (AWS/Redshift)
Amazon Relational Database Service (AWS/RDS)
Amazon Route 53 (AWS/Route53)
Amazon Simple Notification Service (AWS/SNS)
Amazon Simple Queue Service (AWS/SQS)
AWS Storage Gateway (AWS/StorageGateway)

Overview

This check monitors your Amazon Web Services (AWS) resources and the applications you run on AWS in real-time. You can use the CloudWatch Check to collect and track metrics, which are the variables you want to measure for your resources and applications.

From the CloudWatch Check, you can set alerts within Circonus to send notifications, allowing you to make changes to the resources within AWS.  For example, you can monitor the CPU usage and disk reads and writes of your Amazon Elastic Compute Cloud (Amazon EC2) instances and then use this data to determine whether you should launch additional instances to handle increased load. You can also use this data to stop under-utilized instances to save money. With the CloudWatch Check, you gain system-wide visibility into resource utilization, application performance, and operational health.

Circonus takes the AWS Region, API Key, and API Secret, then polls the endpoint (AWS) for a list of all available Namespaces, Metrics, and Dimensions that are specific to the user (AWS Region, API Key, and API Secret combination). Only those returned are displayed in the fields. The names that are displayed under each Dimension type (for example: Volume for EBS) are all instances running this Dimension type and have detailed monitoring enabled.

For information on the master list of Namespace, Metric, and Dimension names available and additional information on Cloudwatch in general, see AWS’s Cloudwatch documentation.

JSON Over HTTP – Data Collection Made Simple

At Circonus, one of our goals is to try to make it as easy as possible to monitor your data. One of the ways we do this is to allow data formatted in JSON to be pushed or pulled over HTTP into Circonus. Since HTTP is spoken everywhere, and JSON is understood everywhere, this allows for easy metric submission so you can collect, store, graph, and analyze everything that you care about.

The HTTPTrap check type accepts JSON payloads via HTTP PUT requests. This allows you to push data from your devices or applications directly into Circonus. This is useful for data that happens sporadically, instead of at a regular or constant interval. HTTPTraps also let you send histogram data into Circonus, so you can see the whole picture instead of one aspect of your data.

The JSON check type gets data from an HTTP endpoint at the interval you select. This allows you to make applications that expose metrics in a JSON format that can be polled regularly from Circonus. These checks allow you to specify a username/password, port, and any additional headers, which gives you security and flexibility in what you allow to connect to your hosts.

One of the major shortcomings with JSON in most languages is the ability to deal with large numbers. Our parser works around that by allowing you to send the number as a string. This means there is no data that you’re interested in that we can’t collect or accept.

The ability to use JSON as a format for data also allows you to write your own data collector. For instance, Gollector was written by the folks at Triggit who wanted to have an agent that relied on the proc filesystem and C POSIX calls. Additionally, both Panoptimon (written in Ruby) and our very own nad agent (written in Node.js) utilize JSON to send system information. Customized agents like these allow you to adapt Circonus to your infrastructure and monitoring needs.

To show just how easy it is to format data so Circonus can read it, this is an example Python script that runs once per minute to generate some randomized data. Once you create an HTTPTrap check in Circonus, you can look at the check to get the URL that should be used in the PUT call. The example includes submitting strings, small numbers, large numbers, and a set of numbers that can be used for histogram data. Similar setups can be used in other languages and in your own custom applications.

import json
import urllib2
import time
import random

# Use the URL provided in the UI from the Circonus HTTPTrap check
httptrapurl = "https://trap.noit.circonus.net/module/httptrap/01234567-89ab-cdef-0123-456789abcdef/mys3cr3t"

while(1):
    # Make up the data
    data = {
            "number": random.uniform(1.0, 2.0),
            "test": "a text string",
            "bignum_as_string": "281474976710656", 
            "container": { "key1": random.randint(1200, 1300) },
            "array": [
                random.randint(1200, 1300),
                "string",
                { "crazy": "like a fox" }
            ],
            "testingtypedict": { "_type": "L", "_value": "12398234" },
            # Set the type to "n" for histogram-enabled data
            "histogramdata": { "_type": "n", "_value": [int(1000*random.betavariate(1,3)) for i in xrange(10000)] }
    }
    jsondata = json.dumps(data)

    # Form the PUT request
    requestHeaders = {"Accept": "application/json"}
    req = urllib2.Request(httptrapurl, jsondata, headers = requestHeaders)
    req.get_method = lambda: 'PUT'
    opener = urllib2.urlopen(req)
    putresponse = json.loads(opener.read())

    # Print the data we get back to the screen so we can make sure it's working
    print putresponse
    print jsondata
    print

    # Wait a minute
    time.sleep(60)

This will show up in Circonus as:

You can refer to the Circonus User Manual for more details about the HTTPtrap check. Also, please refer to the information there to import our certificate if you see the following error while following these instructions:
urllib2.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:590)>

Exploring Keynote XML Data Pulse

I’ll be the first to admit that the Circonus service can be somewhat intimidating. Sometimes it is hard to puzzle out what we do and what we don’t do. Case in point: perspective-based transactional web monitoring.

Many people have asked us, given our global infrastructure, why don’t we support complex web flows from all of our global nodes and report back granular telemetry regarding the web pages, assets and interactivity. The short and simple answer is: someone else is better at it. It turns out they are a lot better at it.

Keynote has been providing synthetic web transaction monitoring via their global network of nodes for many years and have an evolved and widely adopted product offering. So, why all this talk about Keynote?

But why?

You might ask why it is important to get deep, global data about web page performance into Circonus. It’s already in Keynote, right? Their tools even support exploring that data and arguably better than within Circonus?

The reason is simple… you’re other critical performance data is in Circonus too. Real-time correlations, visualization and trending can’t happy easily unless the data is available in the same toolset. Web performance is delivered by web infrastructure. Web performance powers business. Once all your performance data is in Circonus, you can can tie these three macro-systems together in a cohesive view and produce actionable information quickly.

The story of how we made this possible is, as most good stories are, rife with failures.

Phase Failure: the Keynote API

For over a year, we’ve had support for extracting telemetry data from Keynote via their traditional API. For over a year, most of our customers had no idea… because it was in hidden beta. It was hidden because we struggled to make it work. Honestly, the integration was painful due to the API allowing us to pull only a single telemetry point at a time. It was so painful that we struggled to add any real value on top of the data they stored. The API is so bad (for our needs) it almost looks like Amazon Cloudwatch (a pit of hell deserving of a separate blog post).

If you look at a standard deployment of Keynote, you might find yourself pulling data 200-300 measurements from 15 different locations every minute. For Circonus to pull that feed, we’d have to do 4500 API calls/minute to Keynote for each customer! That’s not good for anyone involved.

Phase Success: the Keynote XML Data Pulse

Recently, our friends over at Keynote let us in on their new XML Data Pulse service which looks at their data more “east and west” as opposed to “north and south.” This newer interface with Keynote’s global infrastructure allows us to pull wide swaths of telemetry data into our systems in near real-time… just like Circonus wants it.

If you’re a Keynote customer and are interesting in leveraging our new Data Pulse integration, please reach out to your Keynote contact and get setup with a Data Pulse agreement.

Monitoring Elasticsearch

With the much anticipated announcement of the Elasticsearch 1.0.0 release, we thought we’d mention that several of the features that you use within Circonus are powered by Elasticsearch behind the scenes.

We could never, in good conscience, run a product or service that we couldn’t extensively monitor. So, when it comes to monitoring things we say once again, “Yeah, we do that too.”

Adding elastic search telemetry collection in Circonus is as easy as selecting the Elasticsearch check type and entering the node name. What comes back is a plethora of statistics from the cluster node.

{
  "cluster_name": "elasticsearch",
  "nodes": {
    "zB3lYhArQJCJgJ5szVr4uA": {
      "timestamp": 1392415145096,
      "name": "Hawkeye II",
      "transport_address": "inet[/10.8.3.13:9300]",
      "host": "client-10-8-3-13.dev.circonus.net",
      "indices": {
        "docs": {
          "count": 0,
          "deleted": 0
        },
        "store": {
          "size_in_bytes": 0,
          "throttle_time_in_millis": 0
        },
        "indexing": {
          "index_total": 0,
          "index_time_in_millis": 0,
          "index_current": 0,
          "delete_total": 0,
          "delete_time_in_millis": 0,
...

On an instance here, 382 gratuitous lines of JSON ensue all of which we turn into metrics for trending and alerting.

We use this to track the inserts and deletes and the searches performed on each each node:

We’d also like to give a shout out to the Elasticsearch crew for their successful release. As “metrics people” I’m pleased to see that the old *_time metrics that were not easily machine readable have gone the way of the Dodo and *_time_in_millis_ style metrics have prevailed. You all made the most of the breaking 1.0.0 opportunity to break things is a good way!

Ways to Collect Systems Data in Circonus

When you decide to monitor your systems with Circonus, there’s quite a few options on how to collect your metrics. We believe Circonus should be a tool that does what you need, when you need it. Circonus does not force you into a specific approach or method. Since there are so many different ways to gather telemetry via Circonus, we thought we would take a moment to outline some of the different approaches.

In addition to application-specific checks, you might like to get baseline information about things like memory, CPU, file systems, and interfaces of your servers and network equipment. We’ve listed out the main options that can be used for system performance metrics below, along with a brief description and our recommendations for each. We’ve roughly ordered these based on our best practices, but the tool that should be used depends on many variables that you’ll need to take into account.

For instance, some users may prefer to use a single agent on all of their devices, which may mean that some options won’t be available. Available plugins and ability to expand should also be considered. Some agents allow Circonus to reach out to the endpoint and gather metrics, while others require the data to be pushed (these agents mention push requirements in the description below). In some cases, the language that the agent was written in can have an effect on your decision.

Standard protocols

SNMP – SNMP is a standard that has been around for years, and allows monitoring of many types of network equipment, servers, and appliances. There is a good chance you already have SNMP configured on most of your hosts, which would significantly lower the up-front setup time. You’ll need to know the OIDs you want to monitor, but check bundle templates can make this process a little easier for you.

HTTPTrap – Circonus can accept JSON payloads via an HTTP PUT or POST request. This data is not polled regularly from the Circonus Broker, but is pushed to the Broker from the monitored target. This is the easiest way to get arbitrary data into Circonus, but you’ll have to figure out where to get the data.

Third-Party agents

collectd – Collectd is a lightweight C-based tool that has a variety of plugins available for data collection. There are 2 main ways to use collectd with Circonus, either to push the information from your device over UDP (similar to statsd and HTTP Traps) or via the write_http plugin.

Gollector – Gollector is a new monitoring agent that relies on the proc filesystem and C POSIX calls such as sysconf to determine your machine’s profile. This alleviates any performance penalty from shelling out the collection work that some other agents can have.

NRPE – Circonus can utilize existing NRPE checks from your Nagios or Icinga installation. NRPE allows you to remotely call Nagios scripts to collect information. If you want to monitor a non-standard metric, there’s probably a Nagios script for it.

statsd – Similar to an HTTP Trap, statsd allows your hosts to send information to Circonus Enterprise Brokers, rather than the Broker reaching out to the host to poll it. One downside is that this information cannot be played in real-time, but it can be useful for metrics that may not have regular intervals of available information or are particularly high volume.

Internally-developed agents

nad – Nad is a lightweight, simply managed host agent written in Node.js. Nad is the first choice of Circonus due to its easy extensibility and its ability to work on almost any platform, including Windows, RHEL, Ubuntu, and illumos derivatives. Nad comes with enough plugins to let you monitor any of the basics, while allowing you to add your own checks to fit your environment.

Resmon – Resmon is a Perl-based agent created by OmniTI. New modules can be created quickly and easily, but must be written in Perl; that’s a make it or break it factor for many.

Windows Agent – If you’d rather not use nad on your Windows servers, there is a Windows agent that can be used to collect performance metrics from Windows servers.

Which is right for me?

The choice of agent to use depends on many factors. Current operating system, existing monitoring setup, and network layout can all have an effect on which agent you choose. You may also need to incorporate several choices in order to best monitor your environment.

That covers the main ways to get system information into Circonus. There’s plenty of other methods of getting data, such as Google Analytics, a variety of database connections, Memcached, Varnish, NewRelic, and more. A combination of these collection types can enable you to have data on every piece of your infrastructure, so you can always find the information you need.

Tags: A Long Time Coming

Ok, we know a lot of you have been asking for tags in Circonus for a long time. Well, they’re finally here! The tags feature is currently in beta, and will be released to all customers very soon. (Tags have actually been available to API users for a while, just without UI support in the application.) Let’s jump right in and I’ll give you a quick overview of how tags will work in Circonus.

First Things First: What’s Taggable?

For this initial implementation of tags, you can tag Graphs, Worksheets, Check Bundles, Templates, and Maintenance Windows. You will also see tags on some other pages, such as Alerts, Rulesets, Hosts, and Metrics, but these items aren’t taggable. The tags you see on those pages are inherited from the associated Check Bundles.

In the near future, we’ll be adding to these lists. We are planning on making Annotations and Dashboards taggable, and have some other unique ways we’re planning on using tags to help categorize and group items in the UI.

So, How Does This Work?

First, you’ll need to add tags to some items. All tags are categorized in the system, but if you don’t want to categorize your tags, that’s ok. Simply use the “uncategorized” category for all your tags and the tags will be presented solo (without category) throughout the UI. We have a couple of categories which are already created in the system, but you can create as many custom categories as you wish.

Let’s go to one of the list pages containing taggable items (e.g. the Checks list page) and look for the tags toolbar under an item (it will have an outlined tag with a plus icon). Click the “+” tag to open the “Add Tag” dialog. First choose a category or use the “+ ADD Category” option to enter a new one, then the tags dropdown will be populated with the tags under that category. Choose a tag or enter a new one by choosing the “+ ADD Tag” option, then use the “Add Tag +” button to add the tag to the item.


When the tag is added to the UI, you’ll notice right away that each tag category has its own color. There is a limited set of pre-selected colors which will be assigned to categories as they are created. These particular colors have been chosen to maximize your eye’s ability to distinguish the categories at a glance, and also because they work well under both light and dark icons. So you’ll also notice that the tag you added has its own icon. There’s a set of twelve icons which will be assigned to the first twelve tags in each category. Once a category has twelve tags, any further tags added to that category will receive blank icons. This system of colors and tags will create fairly unique combinations that should help you recognize tags at a glance without needing to read the tag every time. Note: taggable items can have unlimited tags.

After you add a tag to an item, you’ll also notice that a small set of summary tags is added (usually off to the right of the item). This shows the first few tags on the item, providing a way for you to quickly scan down the page and get a glimpse of the tags that are assigned to each item on the page.


One more note about tags and categories. Although you select them separately in the UI, when using the API the categories and tags are joined with a colon (“:”) as the separator. So a tag “windows” in the category of “os” would be represented as “os:windows” in the API.

Tag Filtering

The power of tags is apparent once you start using tag filters. Look in the upper right corner of the page and you’ll see an outlined tag with a funnel icon and beside it a similar menu button. These are for setting tag filters and saving tag filter sets for easy application later. Click the funnel tag to open the “Tag Filters” dialog, and click the “Add +” button to add a filter to the dialog. You may add as many filters as you wish, and in each one all you have to do is choose a category and tag from the choices (you may not enter new tags or categories here; these are simply the tags you’ve already added to the system). Use the “x” buttons at the right to remove filters, or use the “Clear” button to remove all filters and start with a clean slate. Note: none of your changes in this dialog are applied until you click the “Apply” button. After clicking “Apply,” the page will refresh and you’ll see your newly applied filters at the top of the page. You can then use the “Tag Filters” dialog to change these filters, or you can use the menu button on the right to open the “Tag Filter Sets” dialog, where you may save & apply sets of tag filters for easy switching.


One important feature to note is the “sticky” checkbox that appears when you have one or more tag filters applied. By default (with “sticky” turned off), the tag filters you apply are only visible in the current tab. If you close the tab or open a new one, it will not retain the current tag filters. The benefit of this is that we’ve developed a system to allow you to have multiple concurrent tag filter views open side-by-side. So with the “sticky” setting off, you can open several tabs, use different tag filters in all of them, and each tab will retain its own tag filters as you navigate Circonus in that tab. If at any point you turn the “sticky” setting on, the tag filters from that tab will be applied universally and will override all the other tabs. And not only are “sticky” tag filters applied across all tabs, they’re remembered across all of your user sessions, so they will remain applied until you choose to change or remove them.


Host Grouping

One unique feature we’ve already completed is Host Grouping. Head on over to the Hosts page and open the “Layout Options” by clicking on the grid icon at the right side of the page. You’ll see a new option labeled “Group By Tag Category.” If you choose a tag category there, the page will reorganize itself. You’ll now see a subtitle for each tag in the selected category, and under each subtitle you’ll see the Hosts which have Check Bundles with that tag. Because each Host can have many tags, including more than one tag in the same category, you may see a Host appear in more than one group. At the bottom of the page you’ll also see a grouping subtitled “Not in Category.” Under this group you’ll see all the Hosts which don’t have any Check Bundles with tags in the chosen category.