Thursday, April 22, 2010

McAfee's DAT Debacle: When Your Security Software Causes Harm

On Wednesday, April 21st, McAfee's 5958 DAT file (McAfee's virus definitions file) was released with a bad detect for the w32/wecorl.a virus. This inadvertently detected svchost.exe on Windows XP machines as an infected process, and VirusScan took its normal actions - either quarantine or deletion in most cases. Per McAfee's description of the problem, this was a heuristics issue that only showed when deployed - which should make you wonder about how their testing is done.

At 12:05 PM, EST, McAfee sent out an urgent email stating:

"McAfee is aware of a w32/wecorl.a false positive with the 5958 DAT file April 21 at 2:00pm (GMT +1). McAfee advises NOT to download this DAT. Please disable pull tasks and update tasks.

Information updates will be sent every 90 minutes to keep you advised."

Administrators who quickly pulled the DAT using ePO were somewhat protected, as only those systems that had checked in since the DAT release were effected. Those users who did not update while the bad DAT was available were similarly safe. In many organizations that use McAfee's e-Policy Orchestrator (ePO) management tool, this was reasonably easy. For those that do direct updates, this would not have been as simple, and in either case, many systems did update with the bad DAT during the time it was available.

For enterprise users, this meant that any machine that received the DAT before McAfee's 12:05 EST email were likely taken offline. If the system rebooted, it would typically no longer have a working network connection, and thus could not be remotely repaired. Symptoms included blue screens and DCOM errors, as well as shutdown messages.

Less than 90 minutes later - as promised, McAfee sent out a second email with more detail, citing XP Service Pack 3 systems as problems, and noting that the DAT had been removed. In addition, it provided links to McAfee's knowledgebase which, unfortunately, was already starting to perform poorly under the load.

An extra.dat (McAfee's name for off-cycle, non-mainstream update files) was made available shortly after this, with a general email via SNS coming out just before 4 PM EST. The email provided more detail on an issues page for the DAT.

At 8:40 PM, EST, McAfee published both a recommended and an alternate remediation procedure for the DAT created issues. These procedures were included on the issues page, making it an easy central reference.

In total, less than 9 hours had elapsed. A total number of affected machines worldwide is unlikely to be released, but my own experience indicates that the number is likely quite sizable. For most organizations, today has been a day of remediation, often by hand, as machines that rebooted were unable to be accessed remotely with their networking broken.

One of the biggest lessons learned is that McAfee's reasonably new SNS notification service is a must-subscribe for McAfee users and admins. Another is that ePO can greatly increase your chances of getting in front of a widespread update issue given sufficient notice.

For McAfee, the lessons will not be easily forgotten - better testing, amongst other practices is likely to be on their list. It is equally obvious that McAfee did learn a lesson since the infamous DAT that detected Microsoft office files as infected, and their 90 minute notifications and quick response clearly show that.

Will this change how your organization views antivirus updates? Is immediate deployment worth the danger? I'm sure I will be having serious discussions with local PC support staff about the relative risks of a 24 hour delay against the possibility of a bad DAT - and I'm not sure I have as concrete of an answer as I would have a week ago, particularly in light of the increasingly high percentage of malware that evades mainstream AV.

1 comment:

Alastair Revell said...

I think it will take a lot of time for McAfee to recover from the damage to their reputation.

You might be interested in my commentary on the McAfee issues, as well as the multiple posts by colleagues on The Consultancy Blog as events unfolded.

Alastair Revell
Managing Consultant
Revell Research Systems