Hello, my name is Kunal Karandikar and I am a Manager on Symantec’s Consumer Products Engineering team. My colleagues and I are excited about Norton Insight, a brand new feature to the 2009 Norton product line. There have been a lot of questions about how it works and what it does. In fact, there have been a few articles published recently that have compared Norton Insight with other technologies which we think are actually somewhat different, and believe do not really match up well to our new feature. We will hopefully be able to clear up some of the confusion and misconceptions within this article. We welcome your feedback and questions in the comments section at the bottom of this post!
Our engineering team brainstormed over the last few years on how we could reduce the overall system performance impact caused by our security products. We knew all along that one of the core contributors to performance slowdown is file based scanning.
File based scanning is not the only detection method of modern security products, but it is a very mature technology that provides very high detection efficiency. There remains a need to keep this technology alive, yet it is imperative to innovate the way it is used in order to reach a minimal system performance impact.
Over the years we’ve continued to improve file scanning performance. These are mainly incremental improvements yielding around 10%-30% improved scanning efficiency rates with each major product release. We will continue improving these engines, which is always a good thing.
At the same time, we’ve also been looking for technologies that eliminate scanning altogether. This is where Norton Insight was born. Its initial project name was SAPHIRE, which stands for “Scan less by Avoiding Proven High Incident Recurring Entities.” To understand how it works, we first need to take a look at today’s file scanning paradigm.
The traditional Blacklist Approach
Client security software typically identifies malware by scanning files on disk against known signatures. If a signature matches, a threat is identified. This is more or less a “Blacklist approach,” which essentially means that known bad files are cataloged.
The problem, however, is that the threat landscape continually increases as thousands of new variants of threats are released daily. That means that the blacklist is updated on an ongoing basis. Once the blacklist is updated, files need to be checked over again to see if they are detected by the most current blacklist. This is because files that once scanned clean might be identified as malicious later on by a new version of the backlist.
Normally, a file is scanned once it is created, accessed, and/or modified. If the file is accessed repetitively, and the file or signatures have not changed, the file does not need to be rescanned. Most of today’s security products deploy a cache to identify if a file is already scanned. A cache entry must be removed if the corresponding file is either modified or if a new blacklist is available. The latter causes some performance overhead and system slowdown, especially if an updated blacklist is frequently available.
How a simple approach fails
The first straight forward attempt to solve the re-scanning dilemma was to not scan again once a file has scanned clean or did not change. This approach, unfortunately, reduces security because new threats are typically not detected immediately and are detected later on by new blacklist entries. Consider a Trojan horse that once scanned clean and then, despite a new entry in the blacklist, remains on the computer just because it didn’t change and it wasn’t detected at an earlier point in time. We would not be able to recommend such an approach simply because it is NOT secure!
The White-List Approach
The blacklist approach definitely has its strengths, but as it gets updated and formerly scanned files require re-scanning, the approach has inherent performance disadvantages. Interestingly, the opposite approach resolves this disadvantage. If one were to catalog known good files, then re-scanning would be avoided. This is because once a file is identified as good, this state typically doesn’t change. For example, think about the file winword.exe
in your Office directory. Millions of people use it, it never changes, yet security products still scan it.
Today’s operating systems and mainstream applications are, for the most part, finite and well known. Many systems contain almost exactly the same versions of the same binary files, and many of these files rarely – if ever – change. Also, client security software is constantly looking for potentially malicious files or unusual application behavior on a system that can cause ongoing system performance degradation. Yet most users
rarely encounter any of these malicious files or applications.
Norton Insight follows the approach to identify well known files on the client system and excludes these files from ongoing scanning and monitoring.
With Norton Insight, only unknown files need to be scanned and monitored, and the number of known files on a typical system outnumbers the unknown files. Thus, the performance impact of the security software is limited to unknown files only. One might say that we scrutinize only rare,
unknown files, not the typical well known files.
The Growing List
The question remains: how can a security product know about all good applications? First of all, let’s be clear— we will never know all of them. But we can know the majority of them and reduce scanning to the remaining files that are unknown.
Of course this white-list of good files can potentially become extremely large. We certainly wouldn’t want to download to the client a huge white-list definition set in addition to the existing blacklist. Instead, we keep the list on our own servers and make it available through the Internet.
This is what we call the SAPHIRE backend. It is basically a giant database that holds the list of all known good files. Our clients query the database on a schedule. The processing is only executed on idle time, when the computer is not in use so that it doesn’t have an impact on system performance. We also query the backend the first time the Norton Insight user interface is opened. This way, our customers can - at any given
time - evaluate their files and verify which ones are known good files and which of them are currently unknown.
Watching the Community – What is out there?
The Norton Community Watch feature provides security data about applications and submits it to Symantec. The data is then analyzed to determine new threats and their sources, helping Symantec provide more efficient solutions. So, when Norton customers enable the Norton Community Watch feature, the client software collects information about interesting program files on the user’s system. This is done without exposing any personally
identifiable information to Symantec.
The following are deemed interesting files: running processes, modules loaded in running processes, registered drivers, registered services, browser helper objects registered with Internet Explorer, and registered startup files in the startup group or the run registry key. Basically, this group includes all files that run or can run on the system.
For each of the interesting files the software computes a SHA256 cryptographic hash. The SHA256 hash value uniquely identifies that file, and any modification to the file, regardless of how small it is, will change the SHA256 value of the file.
The client submits the file name, along with the file version information and the SHA256 hash of the file, to Symantec. Only static information about the file is submitted to Symantec. A copy of the file is not submitted. The SHA256 value uniquely identifies that file, allowing Symantec to perform statistical analysis on the presence and distribution of that particular file across all systems participating in Norton Community Watch.
Statistical Analysis – What is good?
The information provided to Symantec via the Norton Community Watch feature allows Symantec to build statistical models of file distribution and file trustworthiness. The proprietary algorithms allow us to identify trustworthy files and then assign the Community Trusted rating to these files.
By sorting the millions of SHA256 values by prevalence, Symantec also analyzes the static attributes of the most common files. By analyzing the version information and file names, potential matching vendors and applications are identified. Symantec acquires original distribution media of these applications, and installs them in a clean environment where no external contamination or infection is possible. The installed binaries are then analyzed, including computing the SHA256 values of the files, and if the computed SHA256 value matches the reported SHA256 value, the cataloged
application is a match for the reported file.
All binaries included in the application are thoroughly analyzed, and if all binaries are deemed safe and clean, and the vendor is considered trustworthy, Symantec assigns the Norton Trusted rating to these files.
Norton Insight Classification
Norton Insight catalogs interesting files on the system, and assigns a SHA256 value to the file. A secure connection is established from the client to the Norton Insight backend system. The client provides the backend with the SHA256 value of the file and a lookup is performed in the backend database. If a match is found, the trust attributes associated with the file are returned to the client. The client then assigns the trust attributes to the file. It’s important to note that a file will lose its trust attributes if it is even slightly modified.
Scan performance profiles determine how trust values assigned to a file affect the performance impact of scanning and analyzing the file. Norton users can set performance profiles to one of the following:
Full Scan: The file is always scanned regardless of any Norton Insight attributes being present. This basically means that Norton Insight is not used to reduce scanning impact.
Standard Trust: The file is skipped if the file is trusted by Symantec, i.e. Community Trusted or Symantec Trusted. This is the default setting.
High Trust: The file is skipped if trusted by Symantec or if the file is signed, i.e. Community Trusted, Symantec Trusted, Microsoft Catalog Signed, or
Performance Improvement Points
In summary, Norton Insight reduces system performance impact in the following ways:
- By eliminating all security evaluations on trusted files, frequently accessed files, and commonly used applications running unconstrained, the overall system performance and responsiveness is greatly improved.
- By not scanning trusted drivers, services, and startup applications as they are loaded and executed during the startup sequence, startup and shutdown times are decreased.
- By not scanning trusted applications as they are launched, application startup times are shorter as well.
- By not scanning trusted files during quick or full system scans, scan times are shorter.
Other Uses for Norton Insight
We originally created the application white-list for Norton Insight, which is essentially a performance feature. However, later we found that it also has great value for other security technologies. For example, active detection heuristic technology is prone to falsely identify good software as bad based on the execution behavior of the good software looking similar to the execution behavior of bad software. To prevent false positive detections, the heuristic technology sensitivity is often decreased, but this can also result in some bad applications not being detected.
By using Norton Insight and white-listing known applications, the heuristic detection sensitivity threshold can be greatly increased because known good applications are excluded from heuristic detection, thus the detection rate of malicious applications increase. We therefore check with Norton
Insight each time a heuristic detection triggers. This helps to make sure we do not falsely identify a well known good application as a bad one.
The technology also makes commonly used “allow – deny” pop-ups unnecessary because we can easily find out automatically if it is a known good application or not. This makes our security products more intelligent and a lot less annoying.
Q: How do you know that a trusted file was modified and should not be trusted but scanned?
A: Our kernel mode device driver technology instantaneously revokes file trust attributes the moment the file is modified.
Q: How do you know that a trusted file was not modified when the product was not running, such as booting into safe mode or booting from a CD?
A: On startup, we analyze the NTFS file system, and if we determine that any changes were made that we cannot account for, all trust values of all files on that volume are revoked.
Q: How can you be sure that a Community Trusted file is safe when you did not analyze the actual file?
A: The proprietary statistical methods used to classify a file as Community Trusted utilizes very high margins of safety to eliminate the incorrect trusting of a malicious file.
Q: What if a mistake was, in fact, made? How would you know to start scanning the file again?
A: We have implemented a revocation mechanism where clients receive a list of revoked SHA256 values via LiveUpdate. If the client has a file matching that SHA256 and is currently trusting that file, all trust is revoked, and the file is once again scanned.
Q: Does Norton Insight work on all types of file systems as well as on network drives?
A: No. Due to security reasons, only files on NTFS volumes are supported.
Q: Isn’t computing the SHA256 more expensive than just scanning the file?
A: Yes it is. But we only compute the SHA256 once and then associate it with the file. Once the file is trusted, looking up the trust value is much faster than scanning the file. If the file is not trusted, then it is scanned every time the file is accessed.
Q: How do you associate the SHA256 and trust attributes with the file? Specifically, will these cause similar problems experienced by users of a well
know security product that also associated information with files by using Alternate Data Streams (ADS) and the NTFS Object ID’s?
A: We store the information in a very high performance and secure product-specific database. We do not impact or change the normal file system.
Q: What is the difference between the Standard Trust and High Trust performance profiles?
A: Standard Trust only trusts files that have been validated by Symantec’s secure backend systems. High Trust will also trust files that are signed, and the signature is validated against the machine’s local certificate store.
By Kunal Karandikar and Pieter Viljoen
Message Edited by kunal_symc on 09-05-2008 05:13 PM