There's also the fact that a lot of the malware-infected customers have a worse “Internet experience” and they are likely to blame their ISPs for it. Take something like DNSChanger for example. For us, all customers infected with it have their DNS servers on the opposite side of the planet. That causes some latency issues. Also, when I read something like the “FBI’s Internet Blackout Postponed from 8 March to 9 July”, it makes me chuckle a bit. I checked our share on 7 March and we had two (2) customers reported to us that day, so “Internet Blackout” doesn't really apply to our customers.
What were the most significant challenges in setting up a system that will closely interact with users and warn them about infections? What were the major obstacles your encountered during this process?
Every new thing we manage to get running is aimed at making our work load a bit easier.
Creating a system with a group of people that had never created any systems whatsoever was a challenge in itself. Had there been a commercial solution available at the time, we would have probably taken that route, which would have been a terrible decision that we would most likely still regret. The thing we have come to realize in the last 10 years is that our system will never be finished – we constantly need new features more or less immediately in order to react to new threats or new products of our own. If we had a commercial system, we would have to wait for an update for who knows how long. We started building the system one step at the time. Sometimes we got the features to production within hours while some others took years to accomplish.
To give an example, back in 2002 before we had anything, the most cumbersome task was when we had, say, a spammer, and we might have 100 complaints in the abuse box in the morning. I would take a single email, check the IP address, search the inbox for that single IP address and move all emails containing it to a temp folder. Because new complaints would be pouring in steadily I would shout over my cubicle that “I took 22.214.171.124, don't touch that”. Going through the radius logs, DHCP logs, or whatever was applicable for that case to determine the right customer took around 20 minutes.
Even when we had already shut the customer, complaints of previously sent spam would keep on coming for weeks to come, so we had to remember that 126.96.36.199 had already been handled. Then we had the additional problem of dynamic IP addresses. We had to go through the 20 minute trouble multiple times only to find our that the same customer was behind a lot of the different addresses.
So, the first thing we did was automate the log browsing part by putting DHCP, radius and other logs into a database that our system could use to resolve the customers behind the IP addresses, then we opened up APIs to our customer management systems to get the customer information, choosing credible sources of intel to automate and integrate to our system and creating a webGUI for the handling part. When we had that done, we only had a single case instead of a hundred emails with multiple IP addresses in a messy mailbox. I would have a single row in our handling system saying Customer ID12345 and all those emails would be behind that link.
The next step was to notify the customer and/or shut the connection. Notifying the customer was easy because we already got the customer information automatically. Shutting the customers with a button was a bit trickier, but we started with the connection types where we had the most incidents.