Math on Malware 

Download Article Article in Digital Form

The behavior of networks has been studied for a long time, but this knowledge is now more relevant than ever. In a 1998 research paper on computer viruses,1 Steve White concluded that, in the 10 years prior to the paper’s publication, antivirus technology had been successful for known viruses, but some significant problems for further investigation remained. One of these problems was that the then-current model for spreading computer viruses did not seem to match the practice.

As this article will show, the malware2 problem is already serious, and it is likely that the situation will deteriorate further. The purpose of this article is to use the insights of network theory in the discussion of how the malware problem can be reduced. With a simple network model, the impact of the following commonly used security measures can be evaluated mathematically against the spread of malware:

  • Antivirus (AV) software
  • Incident and change management procedures
  • Security knowledge and awareness
  • Conditions for working from home
  • A periodic reset of software
  • Implementation of different software compartments

Types of Networks

A network is a set of nodes that may be interconnected. Such networks are sometimes also called “graphs.” Research may focus on the properties of individual nodes, but this knowledge devalues if the network contains many nodes, as the Internet does. The examination of a large network mainly provides statistical properties with which its behavior can be better understood and predicted.

There are different types of networks. The behavior of computer networks (e.g., links on web pages), biological networks (e.g., predator-prey relationships) and social networks (e.g., calling patterns) may also be relevant for technological networks such as the Internet with its billions of nodes (i.e., servers, clients, routers). The Internet is designed to be robust against a random failure of nodes. However, the Internet is vulnerable if nodes are attacked in descending order of their number of links to other nodes.

E-mail, peer-to-peer (P2P) computing and web browsing form a social network. The size of a social network is difficult to estimate, but the concept “six degrees of separation”3 (within a small number of steps, everyone knows everyone), also known as the “small-world effect,” proves that it is highly interconnected.

Malware can spread via both the Internet and in-person social networks. An Internet worm can infect an online server or workstation without any user interaction, or a user can unintentionally infect a computer with malware by downloading and using an infected file.

Figure 1Different Process Models

Using the results of previous research,4, 5 three simple network models are compared. These models are not entirely realistic because they all assume that an infection is “evenly divided” over the network, when, in fact, the topology of the network determines which nodes can transmit malware to other nodes (the upcoming section Injection of Malware provides more details on this). The three network models are:

  1. Percolation theory—This model is mainly used for capacity calculation and designates nodes and links as either “free” (failure) or “busy” (operational). This model is not appropriate for malware because a computer can be simultaneously infected by multiple exploits, and yet remain operational.
  2. The Susceptible, Infected, Recovered (SIR) model—The simplest model for the spread of a disease is based on three states (susceptible, infected and recovered) that a single node may go through sequentially. This model can describe the spread of a zero-day computer virus when the used vulnerability in the software was patched after infection and the virus was cleared. Despite patching, there will always remain vulnerabilities in the software. Because a computer can be infected more than once by the same malware or simultaneously infected by different malware, the SIR model is less suitable to describe the spread of malware.
  3. The Susceptible, Infected, Susceptible (SIS) model— This model has two states: susceptible and infected (see figure 1). Not all diseases result in immunity for survivors, so a node can be reinfected after healing. For example, this applies to tuberculosis and malware that exploits vulnerabilities that are not patched. Therefore, for the purposes of this article, the SIS model was chosen to investigate the spread of malware.

Description of the SIS Model

The SIS model divides the population into two parts: infected (i) and the rest (s), which are susceptible to this infection.

The SIS model indicates that, at first, the infection grows slowly because there are few infected computers that can transmit the infection. In the final phase, the infection slowly reaches the maximum because the probability decreases that an infected node can contact an uninfected node. Therefore, infection growth is proportional to the product.

The number of infected computers is reduced by the detection and removal of malware. This decrease is proportional to the number of infected computers (i). The following formulas describe the SIS model.

Formula 1:  ∂i / ∂t = βis - γi, i + s = 1

The expression (∂i / ∂t) represents the increase of the infection (∂i) in time interval (∂t). The contamination factor (β) is the probability per contact that an infected node can infect a susceptible node, and it also reflects the effectiveness of the deployed preventive security measures, e.g., automated patching and secure software configuration.

The probability of the “resuscitation” of an infected node (γ) also determines the average infection duration (D = 1 / γ), indicating the effectiveness of the detective and corrective measures.

The solution of formula 1 is the logistic function6 or S-curve (see figure 3).

An important indicator is the basic reproduction number: R0 (= β / γ), the expected number of new infections from a single infection. By filling in (R0) in formula 1, the maximum number of infected computers (imax) can be determined when, in the final phase, (∂i / ∂t) drops to zero.

Formula 2 image

Research on the SIS model shows that there always remains some risk of infection, regardless of the value of (β). In the steady state, the force of the infection (F = β.imax) is at maximum and equal to (β- γ). When the product (R0.s) is smaller than one, the infection dies out. However, if (R0.s) is greater than one, the infection in the population grows.

The Malware Problem

The battle between cybercriminals and security vendors is at full throttle. Recent studies show that even with up-to-date malware signatures, the detection rates of AV software over time have dropped to approximately 40 percent of new malware.7, 8 Due to the backlog of signature updates and targeted attacks, AV software generally detects only malware that is older than four weeks. Because all AV products show about the same time lag behind malware, malware detection is only marginally improved by deploying multiple virus scanners simultaneously. Also, more virus scanners will produce more potential false positives.

Although modern AV software can sometimes detect malware even when its signatures are unknown, the added value of these heuristic techniques is limited. AV software cannot produce many false positives because, after a short time, the average user will begin to ignore these warnings. Moreover, both closed-source software and malware are often wrapped in encrypted zip files, making malware detection much more difficult. Even in 2006, AV software vendor Kaspersky reported, “We’re losing this game. There are just too many criminals active on the Internet underground, in China, Latin America, right here in Russia. We have to work all day and all night just to keep up.”9

The rapid production and implementation of patches is an absolute necessity, but patches also indicate that software development is not mature. The quality of software can be expressed as the number of errors per 10,000 lines of code. Due to the increasing computer capacity, more complex applications with tens of millions of lines of code are developed and used. At the same time, as products must go to market faster, there is less time to test them. Even after many patches, there remain enough vulnerabilities in software for malware exploitation.

Sometimes, software companies have such a backlog on the development of patches that so-called zero-day exploits can circulate for months before the vulnerability is patched.10 To make things worse, there are indications that cybercriminals can reverse engineer patches into malware. For instance, using a patch that repairs a buffer overflow, it takes about 30 seconds to generate a malicious input file that triggers the buffer overflow in unpatched computers.11 This puts slow-patching organizations even more at risk. Worse still, some organizations have a delay in the implementation of patches. Their computers can be infected by malware misusing vulnerabilities for which patches have been issued long ago. Therefore, good change management procedures have a positive effect on security.

The malware problem continues to grow rapidly. For instance, Symantec created 2,895,802 new malicious code signatures in 2009. This represents 51 percent of all malicious code signatures ever created by Symantec.12 The number of new exploits can be that large because there are “one-click” virus kits readily available on the Internet for little or no money and because the same malware can be encrypted using unique keys. Due to the large amount of malware in circulation, a computer can already be infected with various exploits before the infection is noticed. If the disinfection does not remove all malware, it lowers the value of (g). Incident management procedures should, therefore, rely on a proven incident response plan. This improves the effectiveness of a disinfection because the need to reinvent in stressful conditions becomes unnecessary.13 Optimal procedures for incident and change management are, therefore, reflected in a reduced number of and impact from malware incidents.

Cybercriminals earn more money when (β) is high and (γ) is low (see figure 2). In this way, more computers are infected (imax) and the infection lasts longer. Yet, it can also be advantageous for cybercriminals to let infections grow slowly and unnoticed because fast-growing malware infections appear on the radar of AV software vendors. By varying the contamination rate (not every malware contact with a susceptible computer leads to an infection), three scenarios have been defined and are discussed in the following section (see figure 3).

Figure 3

Figure 3

Different Scenarios

The “corporate” scenario is based on available statistics on malware infections in organizations. Based on the measured effectiveness of AV software for new malware and two consecutive annual surveys on cybercrime,14, 15 the parameters of the SIS model (β, γ) are calculated.16 Because the scope of this research is limited to organizations, this scenario is not representative for the whole population.

The “practice” scenario reflects the common practice to infect as many computers as possible in a short time.17 For this scenario, few reliable statistics are available. Large-scale infections do appear on the radar of AV software vendors, but that does not mean that the malware can be rapidly eliminated. The experience with the Conficker worm has made that clear.18

In the unlikely “cyberwarfare” scenario, the chosen (fictional) parameters are very low so that it takes a lot of time to infect many computers.19 This scenario can become real only if the exploited vulnerabilities can be abused over a long period of time, and for that, malware knowledge of closed source code is needed or logic bombs need to be planted in computer systems. To avoid detection of the malware, it is essential that the infection not be spread widely so that the abused vulnerabilities are not picked up by the system users, other cybercriminals, AV software vendors or software manufacturers.

Periodic Reset of All Software

Network theory predicts that, when the nodes with the most links are disabled, the function of the network will deteriorate rapidly. Thus, the proliferation of spam and malware is best reduced by engaging the source. However, disabling these sources is difficult because access is often impeded by placing malware servers outside of the cybercriminal’s home country and cybercriminals routinely use rotating web servers to control their botnet.

While malware sources are difficult to control, it remains possible to periodically reinstall clean software on computers, which replaces infected computers with uninfected ones. The security improvement of replacing all the software can be determined by adjusting the SIS model. Let (μ) be the average part of the population in which clean software is installed per month. Combining (μ) with the outflow of (un)contaminated computers (μi + μs) in formula 1 gives:

Formula 3:  ∂i / ∂t = βis - i(γ + μ)

Although this measure reduces the factor (R0), the security improvement is small if (μ) is much smaller than (γ), as in the replacement of PC hardware, usually every four years. In the “corporate” scenario mentioned previously, the steady state of malware infections drops almost to zero if clean software packages are installed each year.20 The reset of all computer software is a measure to include in the incident response plan. However, such labor-intensive operations are efficient only when automated.

Knowledge and Security at Home

To increase productivity, the public’s trust (due to privacy) and operational IT systems are vital. A lesser impact of malware means fewer economic damages and more profit. It is a fact that employees cause many incidents. Personal computers of employees at home are often linked directly to business computers by e-mail and Universal Serial Bus (USB) drives. For example, if employees edit business documents on infected personal computers (PCs) at home, the information being edited could be disclosed.

The population of computer users can be divided into two parts: one with sufficient security knowledge and the other with little security knowledge. Because the SIS model becomes complex in heterogeneous populations, the quantitative analysis is not complete.21 Generally, the group of inexperienced users is larger than the group of security experts, and some business computers (e.g., small to medium-sized enterprises [SMEs]) are insufficiently protected.22

On average, a computer user knows little about security. For security experts, the risk of infection by malware (β) is lower and the probability of a successful disinfection (γ) is greater than for security illiterates because the experts, generally, work more safely and have better technical security. However, when inexperienced computer users suffer more frequently and longer from malware infections, this also affects the computers of security experts and enterprises using the same software. This is because malware can be exchanged between users.

When the security knowledge and awareness of inexperienced users is improved, the impact of malware for the entire population significantly decreases, especially when combined with the reinstallation of clean software, as mentioned previously. This does not mean that everyone has to become a security expert. With a periodic security lecture for personnel that states what should and should not be done, including how to secure home PCs, employees quickly become wise about using the Internet. For example, an important rule of thumb is not to start using new software immediately. If, four weeks after the download, the updated AV software still does not find malware in the (quarantined) downloads, it is far more likely that the downloads are actually free from malware. Additionally, some enterprises impose rules for working at home and provide employees with business software and security software for free. Enterprises that select freeware or open-source software as standard products avoid the extra license costs for private usage.

Experts can assess the effectiveness and efficiency of the implemented security measures. If the security is properly designed and implemented, inexperienced users cannot easily infect their PCs. If employees know why their access rights are limited and why business software is white-listed, and if the lessons learned from incidents are widely communicated, support is created and security awareness improves. Even so, the malware risk remains at maximum for security experts and enterprises using market-leading software.

Software Compartments

All software contains vulnerabilities, and computers that use the same software share the same vulnerabilities. For malware, all computers using the same software form a separate population. For example, Windows PCs are a software compartment separated from the Mac OS and Linux compartments. While software compartments may be linked by common code for hardware drivers and network functions, in practice, it is unlikely that Windows malware can infect a Mac. This is because there is little shared source code, which has often been rigorously reviewed to eliminate vulnerabilities.

The larger the population, the more attractive it is for cybercriminals to develop exploits that misuse the vulnerabilities in that population. To maximize their profits, cybercriminals are targeting their malware on the (generally used) software with the largest market share.23 Although it is possible to write malware for a Mac or a Linux PC, at the same costs, Windows malware is much more profitable because of its higher market share.

Generally, monopolies have been proven to be vulnerable, and software monopolies are no exception to this rule:  The probability for malware to infect a computer using this software is the greatest. Therefore, it is obvious that the economics of malware can be reduced by creating more software diversity. To enable this, enterprises must abandon the idea that the interchangeability of information depends on using the same software. Instead, enterprises must dare to rely on data standards to break vendor lock-in. The use of open standards also ensures that data in electronic archives can still be processed in the future.

The SIS model can predict the effect on the spread of malware if the software population is made more diverse. Suppose that (q) is the part of the population that becomes immune to the malware targeting the market-leading software by migrating to alternative software. If the value of (q) does not affect the software monopoly, the vast majority of malware continues to target the market-leading software used by the rest of the population (1 - q). By replacing (i + s = 1) with (i + s + q = 1) in formula 1, the infection rate is decreased:

Formula 4:  ∂i / ∂t = βis - γi = βi(1 - i - q) - γi

By creating more software diversity, malware focused on market-leading software will spread more slowly because of the decrease in fertile contacts, as expressed by the term (βiq). This creates more reaction time for the software industry to respond to new malware. The number of infections in formula 2 is also reduced:

Formula 5

Therefore, the number of infections in the steady state will be reduced by (q), the same as the reduction in susceptible computers. If (q) is greater than or equal to the (imax) of a malware variant, then this malware dies out. Even if (q) is smaller than (imax), the resulting new value of (imax) is lower than the proportional decline of susceptible computers (1 - q). The effect of software compartments is shown graphically in figure 4.

Figure 4

Therefore, the force of the infection (Fmax = βimax = β - γ - βq) decreases with (-βq) for the whole population. If there is only one software compartment, Fmax = (β – γ), so by increasing software diversity, the force of the infection is reduced even within the susceptible (1 – q) subpopulation:

Formula 6

Suppose that the population is divided in two software compartments, in which software A has an 80 percent market share and software B has 20 percent. It is easy to see that (q = 0.8) for the compartment B. In other words, an infection that focuses on compartment B cannot easily spread and, most likely, will die out quickly. Computers using software B are likely infected only by injection of malware, hardly by mutual contacts. This result disproves the often-heard statement that switching from market-leading software does not increase security. Only in the unlikely event that the new software reaches the market shares of the old market leader, will the malware situation be more or less the same. The more software diversity that exists within a population, the more the propagation of malware is disturbed.

This calculation also shows the effects of more software standardization. By replacing the term (-q) with (+q) in formulas 5 and 6, it becomes clear that (imax) and the force of the infection (F) will increase. In the real world, this means that populations that continue to use market-leading software will suffer more damage due to cybercrime than populations that switch to other standard products. However, a necessary precondition to eliminate vendor lock-in is the adaptation of open standards.

Injection of Malware

The SIS model assumes that infections are spread evenly across the network, but this is the case only if the malware has been spreading for some time. Therefore, the model cannot be used to describe the injection of new malware in the population because, for that, the topology of the network is essential. This includes the distribution of worms and malware from Web servers (drive-by exploits). The more links a malware source has, the greater the probability that the infection can spread. Suppose that the number of nodes with a direct relationship to a malware source is equal to (k). If (n) is the total number of susceptible computers, the probability (p) that a malware source can transfer a new exploit (j) is:

Formula 7

Here (km) is the number of connections of the malware source node (m). The index (m) indicates that cybercriminals can use multiple sources simultaneously to spread exploit (j).

The probability that a malware source can infect a susceptible node is also equal to the expected proportion of the population that the source can infect directly. The proportion of the population that is not infected is the product of all the probabilities (designated by ∏c in formula 8) that each individual exploit fails to infect a susceptible computer.

Suppose that every month (c) new exploits are released. The proportion of the population that is directly infected by one or more of these new exploits is (again, switch to the complementary probability):

Formula 8:  ∂ic / ∂t = 1 - ∏c (1 - βjmkm / n)

Formula 8 can be simplified to (βcv) with the following assumptions:

  • (β) is roughly the same for all exploits, choosing the practice scenario.
  • The spreading factor (v = ∑mkm / n) is equal for all malware source nodes and v << 1.

To infect as many computers as possible in a short time, cybercriminals link their malware to popular (hacked) web sites, which have a high value for ∑mkm. Although the 240,000 new exploits per month that Symantec has created (= c) is small compared to the billions of Internet nodes (n), this strategy provides the best way to infect many computers as quickly as possible.

It is also possible to spread malware using a two-stage process, using infected computers in a botnet to send spam messages containing malware. With 100 billion malware messages per day,24 if only one out of every 100,000 recipients of a spam message looks at the “offer,”25 2 million computers can be infected and the volume of spam continues to rise.

The effect of diversification is that only the (1 - q) part of the population using market-leading software is susceptible to an injected exploit. Combining this with formula 4, the overall growth of the infections due to the spread of existing infections and new malware becomes:

Formula 9:  ∂i / ∂t = βi(1 - i - q) - γi + βcv(1 - q)

Thus, software diversification makes sense for both the initial infection and the spread of existing malware because the term (q) is present in both the rate and extent of the infection.


The malware risk is expected to increase in the near future, especially when cybercriminals routinely start to automatically generate malware from software patches. Such practices will put the security of organizations that want to test patches first under great pressure. Even with optimal security measures, not all malware infections can be prevented. While malware infections result in accumulating economic damages to society, no longer can anyone afford to ignore security measures that are effective against cybercrime.

With the limitations noted, network theory and the comprehensive SIS model provide new insights into the effectiveness of security measures. The small-world effect is a double-edged sword: Any enemy in the digital world is just a few clicks away.26 Every infected home computer of an employee is just one step away from the enterprise’s critical information systems. At the very least, a proven incident response plan is a necessary procedure.

Illegal software often contains malware. When employees regularly work at home, enterprises should impose rules of conduct. Employees can infect business computers by sending e-mail from home or by using USB drives. Enterprises reduce the risk of malware by improving the knowledge of employees with mandatory security training and by providing business and security software for free. Enterprises that standardize freeware or open-source software eliminate the extra license costs for this.

Enterprises can reduce the current risk of malware to almost zero by annually resetting software on each computer. It is also advisable to block nonstandard software on business computers using a white list. Improving the security knowledge of inexperienced users reduces the risk of infection for the whole population, including security experts.

A software monopoly maximizes the economic return of malware. Of course, companies can still choose standard software, but from a cybercrime perspective, it is undesirable that all companies use the same software. Increasing the use of nonmarket-leading software reduces the risk of infection because the number of fertile contacts for malware decreases. A sufficiently high percentage of computers with alternative software can significantly reduce malware infections. This alternative software should preferably use open standards to ensure interoperability, avoid vendor lock-in and provide sustainable access to archived information.

Author’s Note

The author would like to thank Robert Kooij, Ph.D., of the Electrical Engineering, Mathematics and Computer Science faculty at the Delft University of Technology (The Netherlands) for his support.


1 White, Steve R.; “Open Problems in Computer Virus Research,” Virus Bulletin Conference, Munich, Germany, October 1998,
2 In this article, the terms “exploits” and “malware” (computer viruses, worms and spyware) are considered synonyms; however, please note that an exploit abuses a vulnerability in software and usually instructs the computer to download and install malware.
3 Kuperman, Marcelo; Guillermo Abramson; “Small World Effect in an Epidemiological Model,” Physical Review Letters, vol. 86, no. 13, 2001,
4 Newman, M. E. J.; “The Structure and Function of Complex Networks,”
5 Van Mieghem, Piet; Jasmina Omic; Robert Kooij; “Virus Spread in Networks,”
6 See
7 Staniford, Stuart; “Do Antivirus Products Detect Bots?,” FireEye Malware Intelligence Lab, 20 November 2008, http://
8 AV-Comparatives e.V., “Anti-virus Comparative—Proactive/Retrospective Test—February/May 2010,”
9 Naraine, Ryan; “The Zero-day Dilemma,”, 24 January 2007,,1759,2087034,00.asp
10 Leyden, John; “MS Knew of Aurora Exploit Four Months Before Google Attacks,” The Register, 22 January 2010,
11 Brumley, David; Pongsin Poosankam; Dawn Song; Jiang Zheng; “Automatic Patch-based Exploit Generation Is Possible: Techniques and Implications,”
12 Symantec, Global Internet Security Threat Report Trends for 2009,
13 Computable, “Incident management badly needed” (in Dutch),
14 Ernst & Young, “Results ICT Barometer on Cybercrime” (in Dutch), February 2010,
15 Ernst & Young, “Results ICT Barometer on IT-security and Cybercrime” (in Dutch), 28 January 2009,
16 From the AV software research in endnote 8, γ = 0,435; calculated from the (∂i/∂t) in endnote 14, β = 0,52
17 From the AV software research in endnote 8, γ = 0,435; an estimation of β = 0,9
18 The Rendon Group, “Conficker Working Group: Lessons Learned,” USA, June 2010,
19 Fictional parameters for the “cyberwarfare” scenario: β = 0,1 and γ = 0,04
20 See endnote 16, γ = 0,435 + 1/12 = 0,518, β = 0,52; the steady state of malware infections drops from 0,16 to 0,003.
21 Omic, Jasmina; Robert E. Kooij; Piet Van Mieghem; “Heterogeneous Protection in Regular and Complete Bi-partite Networks (Work in Progress),” International Federation for Information Processing, 2009,
22 This statement is based on the professional experiences of the author.
23 Computable, “Engage cyber crime: Divide and Conquer,” (in Dutch), 21 February 2008,
24 Trend Micro, “TrendLabs Global Threat Trends 1H 2010,” Philippines, 2010,
25 Specter, Michael; “Annals of Technology—Damn Spam: The Losing War on Junk E-mail,” The New Yorker, 6 August 2007,
26 Murphy’s Law of Combat No. 1:  If the enemy is in range, so are you.

Henk-Jan van der Molen
is a freelance teacher of business intelligence, information security and change management at the Wageningen University (The Netherlands). He can be reached at

Enjoying this article? To read the most current ISACA® Journal articles, become a member or subscribe to the Journal.

The ISACA Journal is published by ISACA. Membership in the association, a voluntary organization serving IT governance professionals, entitles one to receive an annual subscription to the ISACA Journal.

Opinions expressed in the ISACA Journal represent the views of the authors and advertisers. They may differ from policies and official statements of ISACA and/or the IT Governance Institute® and their committees, and from opinions endorsed by authors’ employers, or the editors of this Journal. ISACA Journal does not attest to the originality of authors’ content.

© 2011 ISACA. All rights reserved.

Instructors are permitted to photocopy isolated articles for noncommercial classroom use without fee. For other copying, reprint or republication, permission must be obtained in writing from the association. Where necessary, permission is granted by the copyright owners for those registered with the Copyright Clearance Center (CCC), 27 Congress St., Salem, MA 01970, to photocopy articles owned by ISACA, for a flat fee of US $2.50 per article plus 25¢ per page. Send payment to the CCC stating the ISSN (1526-7407), date, volume, and first and last page number of each article. Copying for other than personal use or internal reference, or of articles or columns not owned by the association without express permission of the association or the copyright owner is expressly prohibited.