Tuesday, October 20, 2015

Log Archiving with Logstash

A crash, a panic attack and we rush to the logs hoping it comes to rescue. Alas there is so much content in these logs that finding the relevant content itself is a herculean task. It is thus a good idea to think upfront and brace yourself to handle the panic situation better. Keep all the logs ( application logs, database logs etc) nicely organized in a tree structure all of them being kept at one place. More technically, it is a good idea to archive your logs and have a way to make sense of the information hidden in logs. I was recently faced with such a situation and ended up making an ELK based stack.

I started with syslog-ng. Syslog has been around for sometime so it gave me a sense of stability and good community around it. Unfortunately syslog is a bit outdated too! The next thing we considered was logstash. Logstash along with Elasticsearch  and Kibana forms a powerful stack, popularly known as the ELK stack.

For this blog post, we will limit our scope and stick to logstash part of the ELK stack. We will make an archive which collects logs from different web and database servers and archives them all in one place. The intelligence and visual analysis are left to Elasticsearch and Kibana respectively and thence not part of this post. Following is a very high level architecture I came up with:

archiving architecture.jpg
We are going to work with two pieces of software here:
  1. logstash-forwarder
  2. logstash
We will install logstash-forwarder on the clients (our applications that generate logs). This is an agent that sends the logs to a central archive server. Note that logstash-forwarder needs to be installed at EVERY client. We will next install logstash on the archiving server. This will receive logs from every client and archive it in a format that you can define. Here is what I came up with for my archive structure.

I like to archive by time and then the log source as depicted in the picture above. Now that we are done with sorting the architecture and our logging format we need to next configure our clients and the server. Digitalocean has done a wonderful job of explaining the installations and configurations so I will leave you with this article. You can skip the parts involving Elasticsearch, Kibana and ngnix.

Once you are done with archiving make sure you have a way to periodically clean logs from the application servers, if at all you store last 1-2 days of logs there too. For this I set up a cron job that cleans up logs older than a week in my case.

No comments:

Post a Comment