RGcrawler Data goes Head-to-Head with Competing Security Solution: Anti-malware solutions tested for relative effectiveness

 

Background

The most common way for users to get infected by malware is by unwittingly installing a malicious program on their PCs. Aside from email viruses, which is largely a solved problem, this is the single biggest PC security threat today. To combat this threat, Robot Genius has created a comprehensive security solution that consists of a desktop security client, web-crawler malware data, and a lightweight browser plug-in. These components work together to deliver the industry’s most comprehensive prevention, detection, and remediation against software threats.

The web crawler technology solution has tested the entire universe of roughly 1 million Windows executables, using an automated process that discovers full path URLs of all malware and malware vectors available for download. This method is much more effective than other available anti-malware products because it offers a higher degree of granularity. For example, we might know that http://gamedownloads.com/stor1/dragracer.exe is a game that installs spyware in the background on your machine. That means users can choose to block either the entire domain http://gamedownloads.com, or just the URL that links to the dangerous executable, dragracer.exe.

Building this automated system is a challenging task -- for example, the machine intelligence (scripts) that automatically install any software package on a test machine (including ones in foreign languages) requires significant time to develop.

To test the effectiveness of this automated, behavior-based approach, Robot Genius went head-to-head with a leading security solution vendor. This paper summarizes the results from comparison testing performed on Robot Genius’ RGcrawler data and the competitor’s solution. The Robot Genius solution maintained a false positive rate close to zero. Results from the comparison were verified using commercial scanners from Norton, Trend, and SpyBot.

Overview

The most common way for users to get infected with malware is by voluntarily installing programs on their PCs. Aside from email viruses, this is the single biggest PC security threat. Security programs that are adept at discovering viruses have trouble finding malware, because malware often morphs, disguises itself, and oetherwise makes detection difficult.

While the actual characteristics of malware applications change frequently to keep under the anti-malware giants' radar, URLs are more constant: malware creators want the executables to be indexed by search engines, otherwise potential victims would never find them. Furthermore, malware affiliates - those who bundle their legitimate programs with malware for a fee - cannot keep themselves up-to-date without a stable URL.

The Robot Genius web crawler technology automatically tests every Windows executable on the Internet, including a few million applications, ranging from screen savers to games to office applications, and creates a database of the URLs, including characteristics of all malware or malware vectors. RGcrawler data is the product of this technology: a valuable and desired ‘block list’ of all URLs containing malware, as well as an ‘allow list’ of safe executables. This data list is simple to incorporate into firewalls, gateway devices, or search results. It can even be used by ISPs at the network level, to either block end-users from downloading dangerous applications entirely, or providing warning before proceeding.

Test Description

Robot Genius recently conducted a head-to-head test with one of the three leading desktop security companies –a public company with malware labs in several countries employing hundreds of engineers. Instead of automated technology, the company relied on its large teams of engineers, over a decade of experience in desktop security, and a large accumulated database.

Both companies tested a list of 8,000 unique URLs, each pointing to an executable program available for download. Results were compared using scanners from Trend, Norton and SpyBot to double-check the determinations.

Results

view test results

Robot Genius’ automated system was able to process the list of 8,000 unique URLs in a few hours. The crawler technology caught twice as many pieces of malware and had a far lower false positive rate*.

By contrast, the public company had a very high false-positive rate - about half of the programs it classified as malware were not malicious, although they were possibly linked to by sites that contain actual malware.

Technical Information

Because the Robot Genius web crawler farm tests every Windows executable on the Web, it knows which ones are malware or malware vectors. There are about 75,000 bad executables in a universe of a few million Windows executables (5 percent).

The total universe of executables is several terabytes of data. Robot Genius’ servers download and store these executables locally.

Each executable is installed on a test machine running proprietary security software (Spyberus). The ability to install an executable automatically on a PC involves some machine intelligence - it has to know how to click through the EULA and install options in popup windows. The system is nearly 100% effective in English, and at varying levels of efficiency in other major languages. Output to clients is normalized to UTF-8 by default (Note: URLs encoded in GB 18030 and Big5 are not fully supported at this time).

Various behaviors logged by Spyberus will result in a program being flagged as malware. These include: DLL injections (process hijacking), installing a keylogger or sniffer, installing a driver, etc. The system also looks at the effectiveness of the uninstaller that is registered with the program.

The output data on what the package does during install is stored in the database.  About 5% of all executables are malware. Any remaining marginal cases can be checked by hand without making the process very dependent on human intervention.

Link analysis identifies all web sites (domains) that link to dangerous executables, providing an additional layer of metadata on each site. This metadata uses a per-domain-name approach to identifying sites as bad, good, or otherwise.

Clients of RGcrawler data can access the list via a secure, caching-enabled SDK implemented in C++. This data stream provides results as lightweight as pure URLs or domain names to details as fine-grained as the very reasons why a program was declared bad. Users can access common data points, such as a program’s name, URL or domain without incurring the additional overhead from retrieving the full data. Thanks to free standards for manipulating XML, such as XSL (the eXtensible Stylesheet Language), users can port this data into existing text or formats, and even other forms of XML.

Conclusions

RGcrawler proved more effective in detecting malware on the web than the manual method used by the large security company. Robot Genius’ unique behavior-based anti-malware technology was more adept at separating actual malware executables from non-threatening programs on web sites with links to malware. RGcrawler data was also able to identify the exact location of all malware found by illuminating the full URL path to the malware. Using RGcrawler data, users experience the lowest number of false positives.

* The rate was essentially zero. The two false positives indicated in the graph below were due to a bug that has since been fixed.