aknudsen

From 6 hours to 6 minutes with ExecuteSQL()

Discussion created by aknudsen on Oct 26, 2017
Latest reply on Oct 26, 2017 by bigtom

Network traffic analysis using FM? Yeah, that's what I do as part of my job. As a network engineer I needed a way to analyze traffic to understand the flow, what traffic enters and leaves the network, and how much of it.

 

The goal was to take the 10's of millions lines of log and condense it down to a few thousand lines to be put into a Excel sheet with fancy pivot tables looking like this.

 

Screen Shot 2017-10-26 at 05.30.38.png

The Excel is a template and I use the MBS plugin from monkeybreadsoftware to replace a named range so I don't have to alter the pivot tables with new ranges every time I have a report due.

 

I also use the MBS plugin to import my raw data from currently 12 mySQL servers located throughout the US. By mid next year this number will grow to around 40. Since my data is either new or changed the FM.InsertOrUpdateRecord function is invaluable. Could I do it in pure FM sure, but why reinvent the wheel?

 

Anyway, the data I'm consuming is a combination of Syslog and Netflow so it has all the fields from a Syslog format where the count and bytes are updated the way Netflow logs work. The logs are use on Layer 7 traffic which means we identify not only the source and destination ports, but also the application i.e. Google-base, SSL, SKYPE etc.

 

We also know the direction of the traffic such as inbound and outbound Internet traffic. We also  what countries its going to and coming from.

 

Some traffic is allowed by default and some not, but that is irrelevant to the calculations. So my key elements here are:

 

  • Direction
  • Zones (can be multiple internet facing connections)
  • Application
  • Country
  • Byte count
  • Hit count

 

On average there is about 400 different applications used and an average of 100 - 140 countries with traffic going in 2 directions. Combine that with an average log size of 20-30 million records. You do the math

 

In my log table I had to calculated fields, one for bytes and one for hits. There are no overhead on import since my data is coming across a WAN link, and I have a 6 core Xeon server with 64GB of RAM.

 

So the old way I was doing it was to create 2 lists with ExecuteSQL(), one for country and one for application and perform a find on the raw log file. By doing this twice, once for the incoming and one for the outgoing traffic I would have my report created. Remember I already have the bytes and hits as calculations, which I could just grab from the find.

 

script.png

 

On a 28 million line log that took about 6 hours and works like this.

 

  • Take 1st item from country list.
  • Take 1st item from the application list.
  • Perform a find.
  • Loop

 

100 countries X 400 applications = 40,000 find against 28 million lines of log

 

You get the idea. This would work, but with the next network coming online, it wasn't only one outbound and one inbound connection, there were three. So now this would grow to a 18 hour exercise. Not acceptable.

 

So after some tinkering I removed all the calculations from the raw log file, and redid the logic using ExecuteSQL() only.  The bonus here is that I can easily add all the interfaces in the query at once.

 

  • Query all Countries and Applications Outbound using DISTINCT in query
  • Query all Countries and Applications Inbound using DISTINCT in query
  • Build the report file (2000 lines)
  • Perform find on the report file and query Bytes and Hits, then update
  • Perform find on the report file and query Bytes and Hits, then update

 

1000 lines Inbound  + 1000 lines Outbound = 2,000 ExecuteSQL() queries against 28 million lines of log

 

sumquery.png

Why even bring this up? Well, there are several forum postings saying that ExecuteSQL() can be really slow, and it sure can be. But if you break things up into smaller pieces it can be very fast. You milage may vary, but it is something to look into if your current operations are performing poorly.

 

So what value do I get from this report? Besides having this report autogenerated on a given date and emailed to the recipients who want it, absolutely NOTHING. This report to me is useless. It's for management who like nice charts with colors.

 

The real value for me is:

  • Reports that can autogenerate a complete firewall ruleset for all incoming and outgoing traffic
  • Detect what equipment needs to be reconfigured before the network migration can occur
  • Identify traffic that will not be allowed so it can be blocked

 

So is FM the right product for what I'm doing?

I would say yes when combined with the MBS plugin. I'm not using FM in the most traditional way.

 

Could this process have been done faster using other tools?

Sure, and I've been looking at Xojo for more specific things such as web services and small service applications to report back on equipment specific things.

 

I'll get off my soapbox, but I hope some of find this useful.

Outcomes