Marcus Ranum on security innovation and Big Data
by Mirko Zorz - Tuesday, 11 March 2014.
You're going to still have a great deal of analytic work to do (it's just that now you can do it fast!) and it's not going to somehow add magical new nuggets of information to what's already there. The data is, in fact, already there and if you look at it and start doing the analysis right now, you can get some idea of whether or not your big data solution is going to give you any useful information in the long run.

I don't want to seem dismissive of big data, but I think a lot of the problem is that right now organizations collect all this stuff and never look at it. So with big data they may put it in one place and look at it, and discover things they should have discovered long ago. The funny part is that if they had made those discoveries long ago, they could have skinned the necessary fields out of the data as it was being collected, and pre-computed it - a much faster and easier approach all along.

What's holding up major Big Data adoption?

I think the hold ups are pretty reasonable, really. Big data's value proposition is that you will uncover important and useful relationships in your data once you spend all this time, money, and effort putting it into a big data system. That sounds suspiciously vague, doesn't it? I'm guessing it's hard to get sign-off on a large-dollar deployment on the basis that some unspecified wonderful thing is going to most likely happen. And the problem is that if there is a specific use case for a particular data correlation it's probably fairly straightforward to just do it using existing data structures. Oh, you want to trawl my customer database and see who shipped goods to the same address as another customer and map them as "friends" then send reminders when it's the "friend"s birthday? That's just a couple queries against our existing database, no need to put everything in one great big place to figure that out.

I think there's a weird kind of catch-22 going on: when you have a good clear use case for big data, you realize you don't need big data at all, for that use case. So big data has to sell on the potential for all this undiscovered goodness.

We ran into a similar problem back in the early days of the intrusion detection system. Some of us took the approach of collecting a ton of stuff and searching for entity relationships within it, whereas others built big pattern-matching expert systems. Of course, the pattern-matching expert systems won because it's much much faster to look for a pattern in incoming data than to trawl around in all your data trying to figure out if there are subtle relationships that haven't been noticed yet.

The pattern-matching approach doesn't require the customer to have deep expertise, just a large knowledge-base of rules and a willingness to turn off rules that produce annoying results. The search for entity relationships requires the customer to be able and willing to actually figure out the meaning of the relationship once it is discovered; something computers can't do.

After a few years I realized that the division between the systems was a matter of where the knowledge about their output was applied - encoded in advance by the vendor, or applied to the results by the customer. Then it was obvious which one was going to win. So, suppose you're going to try to do big data with your system logs and whatnot: how is that going to compare with a SEIM solution that comes out of the box with a few useful dashboards and some nifty rules for summarizing some stuff?


Harnessing artificial intelligence to build an army of virtual analysts

PatternEx, a startup that gathered a team of AI researcher from MIT CSAIL as well as security and distributed systems experts, is poised to shake up things in the user and entity behavior analytics market.

Weekly newsletter

Reading our newsletter every Monday will keep you up-to-date with security news.

Daily digest

Receive a daily digest of the latest security news.

Thu, Feb 4th