Monitoring AWS with Telegraf & InfluxDB Cloud

Recently I’ve been playing with InfluxDB Cloud and Telegraf for synthetic monitoring of Amazon Web Services API endpoints, and thought I’d share my configuration notes.

What are InfluxDB and Telegraf?

InfuxDB is an open-source time-series database, and Telegraf is a software agent for sending time-series data to InfluxDB. Telegraf has hundreds of plugins to collect different types of time-series data.

One of these plugins, inputs.http_response, sends http/https requests to a list of URLs, and sends metrics about their responses — return code, response time, etc. — to InfluxDB.

Why monitor AWS?

Let’s suppose that you run your applications on Amazon Web Services, and you want to have a record of whether the specific AWS services you use are running or not. Sure, you can check the AWS Service Health Dashboard. And while plenty of good people work at AWS, that’s a bit like the fox guarding the henhouse:

For this reason, it’s good to do your own independent monitoring of AWS services — and for that matter, any cloud services or SaaS apps you depend on. So, here are some quick tips on how to get that going with InfluxDB Cloud and Telegraf http_response.

Telegraf configuration for Synthetic Monitoring

First, get Telegraf running on your machine and an InfluxDB Cloud instance. Here’s a step-by-step tutorial of how to do that. Next, review this overview of synthetic monitoring with Telegraf and InfluxDB.

After doing so, copy this Telegraf configuration file to your local machine, and run it using this command in your terminal:

Assuming all starts well, you’ll see terminal output similar to this:

From here, you log into your InfluxDB Cloud Data Explorer (second icon from the top) and start graphing your data.

Let’s explore some key parts of our Telegraf configuration file,telegraf-synthetic-aws.conf.

First, we tag our time series to make it easier to query in the Data Explorer. We state that these metrics from a company called Amazon, and a service called AWS. This might be helpful if we’re tracking metrics from, say, Google Cloud or Azure.

Since there are potentially lots of endpoints we could monitor, I like to conserve my data usage. For this reason, I keep the hostname out of my data streams, since I don’t query on that information.

We need to specify that we’re using InfluxDB Cloud version 2, as opposed to the version 1 cloud or a local instance:

We state where we can find our InfluxDB Cloud instance:

We list our secret token to authenticate Telegraf to post data to our instance of InfluxDB.

$INFLUX_TOKEN means that this is coming from an environment variable of the same name. If you haven’t already set this, you can do so via this terminal command:

Here’s how to find your InfluxDB Cloud token if you don’t already have it.

Anyways, back to telegraf-synthetic-aws.conf. The next few lines describe the organization and the bucket you’re writing to.

Your organization name is at the top of every screen in InfluxDB Cloud:

And you can find your list of buckets (bucket list?) under the Load Data command:

Alright, now we get to the meat of the matter: the Telegraf plugin we are using, and the URLs we want to monitor:

The line [[inputs.http_response]] tells Telegraf to run our http_reponse plugin. Each URL needs to be surrounded by quotes and separated by commas.

In the example above, we’re monitoring just the main AWS URL http://aws.amazon.com. But if we wanted to monitor more, we could easily do so. For example, here’s how to monitor some EC2 endpoints:

And here’s how to monitor some S3 endpoints:

You probably get the picture by now. Here’s a list of all AWS endpoints. Pick out the ones that you care about and plug them into the section above.

The default http_response configuration file doesn’t follow redirects. I prefer to follow redirects to ensure that they don’t lead to a 5xx error, meaning that a service is down. Here’s how to do that:

Putting all the above together, here’s the entire Telegraf configuration file:

By this point, you should have everything you need to monitor AWS and other cloud services and SaaS apps that are available via the web.

Addendum: List of AWS endpoints

In writing this post, it was pretty amazing how many Amazon Web Services there now are. Even more staggering: AWS is delivered through over 2000 http endpoints for its various APIs.

Now, if you’re only using a few AWS endpoints, you can skip this section. But, if you want to monitor AWS endpoints en-masse, read on.

(Caveat: monitoring dozens or more URLs may put you out of the InfluxDB Cloud free tier and require you to be on their paid plan.)

Here’s a list of AWS service endpoints in plain text format I compiled from this AWS documentation page.

Since this list will change over time, here’s how to compile a list of AWS endpoints using Terminal:

Let’s break down what’s happening here:

  • curl captures a web page
  • https://docs.aws.amazon.com/general/latest/gr/rande.html is the webpage of all of AWS endpoints.
  • Using a Unix pipe, we send that webpage to the grep command, which in turn pulls out all lines with amazonaws. Thankfully, AWS has a convention, as far as I can tell, of always using amazonaws in the URLs of its service endpoints. This makes things easier.
  • sort sorts all lines of text alphabetically, and the -u flag shows only the unique lines so we don’t have repeats.
  • We then send the output of sort to a new file, called aws-endpoints.txt.

From here, you can use the text editor of your choice to any extra cruft on each line.

Occasional thoughts on tech, sailing, and San Francisco