In just the past few years, the threat landscape has significantly changed – and traditional antivirus technologies can’t keep up. Attackers generate literally tens of thousands of new malware variants every day, often distributing each variant to just a handful of users. That means that each time a new victim visits an attacker’s web site, a slightly different piece of malware is served up. How can a security vendor ever hope to discover each of these thousands of variants?
We call this problem “server side polymorphism” and we’ve seen it grow over several years. You can see it most clearly in the chart below. Every AV vendor shows this chart – look at explosive growth of threats! But in reality, there aren’t that many more attackers – there are just more variants of the same attack.
There are two ways to approach this problem today. First, we can build more proactive technologies. We’ve done that. We look for program behaviors, we look for suspicious characteristics of a file, and we look at things that are harder for attackers to modify, like their IP addresses and network traffic. Second, we can just react faster – if we can roll out our signatures fast enough, we can provide better protection. And we do this too – we spit out millions and millions of signatures, and roll them out within 5 minutes. In our 2010 products, we’re even putting the signatures right into the cloud just like our competition.
But we know that these two approaches, while definitely very beneficial, haven’t been enough. The attackers are relentlessly improving their techniques. So starting several years ago, we wondered – is there a new approach – a third option?
The Birth of Reputation-based security technology In 2006 we had an epiphany. We wondered – could we somehow leverage the wisdom of crowds to tell us if software is good or bad? We saw how well Amazon.com’s rating system worked for books, and this seemed like it could be a viable approach for rating software as well. But there were two problems. First, most users have no idea whether the software they’re downloading is good or bad. While we all know how much we enjoyed the last book we purchased from Amazon.com, people can’t easily tell what software is actually doing under the hood. Your program might look like a cool game, but it could also be stealing your e-mail password the next time you log in. The second problem is users don’t want to be constantly asked to rate software. Sure, some may volunteer their thoughts, but most don’t want to be bothered by their security software.
Yet the approach was intriguing and it made us think. Could we harness the untapped wisdom of our hundreds of millions of users without actually prompting them? In August 2006, we stumbled upon a possible approach that promised to yield useful reputation ratings on applications without any user prompting – our holy grail!
Within several weeks, in Symantec Research Labs, we created a computer simulation to model the approach and after several more weeks of evaluation, we came to the conclusion that we had identified a viable new malware protection approach.
While we can’t discuss our exact approach, we can provide a flavor for how it works. Most importantly, our approach is dependent on the collection and submission of anonymized application data from customers who choose to participate in Norton Community Watch. This data includes application hashes and other metadata about each executable such as how it arrived on the machine, the publisher name, and the program’s name and path. The volume of the required file metadata is extremely limited – typically just tens of kilobytes of data per machine per month, and participating machines submit this data to Symantec roughly once per day.
Each day, our back-end servers import many gigabytes of reputation telemetry data from tens of millions of customers and use this data to compute file reputations. Our process is similar to some other well-known systems, such as Netflix’s recommendations, or Google’s PageRank™ algorithm. Just as these systems derive relevancy scores for movies or web pages, our reputation engine produces security reputation ratings for every single executable file known to our tens of millions of participating users. It might sound simple, but trust me, it’s quite a bit of effort. We constructed a huge data center for continually calculating and recalculating our trust ratings. Luckily folks at Symantec know something about data centers.
This new technology is called Reputation-based security and it computes a security rating for every file downloaded, installed, or run on every Norton Community Watch user’s machine. Every file is assigned a classification (good or bad) and a confidence level that indicates our confidence in that classification. Unlike traditional fingerprinting which either detects or does not detect a file, this new approach refines each file’s reputation over time as more data is available about our community’s usage of each file.
Using Reputation-based Security Technology
In our 2009 products we started using the reputation data to improve performance. With some basic algorithms we were able to identify many programs that we could trust – and we avoid scanning these files for infections. On typical users’ computers we are able to trust 60% to 90% of all regularly used programs, drastically reducing our scan times.
This year we have built out the full Reputation-based security system in order to block threats. We now automatically determine the reputation of hundreds of millions of files. Norton Internet Security 2010 and Norton Antivirus 2010 are fully integrated with Reputation-based security – we check it in three different cases:
1. Blocking downloads with Download Insight. Downloads are the primary delivery mechanism for new malware, and especially for server-side polymorphism. We use Reputation-based security to evaluate all downloads made from programs like Outlook, Firefox, and Internet Explorer via our Download Insight feature (see Viral’s excellent description and video here). Our products now tell you whether your download is safe, unsafe, or somewhere in between. And – if your download isn’t safe, our products take immediate, decisive action.
There are times when we are unable to make a determination about a program, for example because it is too new. In this case we provide useful information: how old the file is, and how many users have it. Put simply, in the rare case where the new technology cannot determine a reputation, we give the user information that is easy to understand and act on. My mother could easily understand a prompt like the following: “The file you are about to use was discovered less than 4 hours ago and has been used by less than 10 users. We recommend against using this file.” Being risk averse, she knows that few people have used this software and that it was just released – perhaps it’s worth letting other people try the application first rather than being a guinea pig. Just as you might pass by a restaurant with few patrons or avoid a restaurant that recently opened (until it established a good reputation), computer users will now be able to obtain that same intuition for new programs.
2. Improvingheuristics. The 2010 Norton products use Reputation-based security technology to enhance our heuristic technologies. Heuristics look for patterns – either in the file itself, or in its behaviors on the machine. Suspicious patterns are often easy to identify, but there is always a risk of false positives – of calling a good program bad by mistake. This new approach is a completely different way of determining if a file is good or bad. It looks at anonymous usage patterns across our massive userbase, not at the program itself to detect threats. So when we combine our heuristics and Reputation-based security technology, we get higher detection rates and we reduce our false positives.
For example, today we often cannot block a file which looks “borderline suspicious” because so many good programs may also look suspicious. We don’t want to make a mistake. But now we can detect just the bad programs by double-checking with the reputation information. If the file is also found to have a low reputation, what would otherwise have been a missed infection will now be detected and instantly quarantined. This hybrid checking is performed both in our real-time scans as well as during standard on-demand and idle time scans.
3. Current running programs. Finally, Reputation-based security technology is used to check all running processes in our new Norton products. If any running program has a low reputation, we automatically quarantine it.
Why It Really Matters For many years we have played a cat and mouse game with virus authors. They create a threat, we detect it. They tweak it, and we improve our detection. Every improvement on each side begets another improvement.
Reputation-based security technology changes the game to put Norton in the driver’s seat. The reputation ratings are entirely independent of traditional virus signature detections. This gives us three important advantages.
1. The first comprehensive technology. Our Reputation-based security has ratings and metadata for every file run by every machine in the Norton community. As such, it can identify entirely new infections that would otherwise evade traditional fingerprints and heuristics. Even if an attacker targets just a single user with a new threat, it can discover and block the threat.
2. Impervious to server-side polymorphism. Because attackers are giving everyone a different version of the same threat, it is extremely difficult for traditional signatures to detect all of these versions, because so few copies exist of each. But with Reputation-based security, these threats stick out like a sore thumb to our reputation algorithms.
3. Impervious to obfuscated malware. Obfuscation techniques such as packing, polymorphing, and encryption are useless against a reputation-based approach. Such schemes have been shown to lower the malware’s reputation precisely because such malware has different distribution patterns than traditional, legitimate applications.
4. Harnessing Community Watch. Traditional detection detects malware based on what one particular file is doing on one particular machine at one point in time, taking a myopic view of the world. Our reputation data is based on our entire user base – this world view yields an infinitely richer context.
The 2010 products are just the beginning. We expect to make lots of tweaks to Reputation-based security technology over the coming years. Just as other large community-based systems improve over time (Netflix, for example), so will we. This is a whole new approach. It brings an entirely new detection method to our products. It strengthens our current technologies. And it shifts the odds in our favor and away from the attackers. Suffice it to say, we’re pretty excited to see it in our 2010 products.