ISACA Journal
Volume 1, 2,016 


Actionable Security Intelligence From Big, Midsize and Small Data 

C. Warren Axelrod, Ph.D., CISM, CISSP 

Information security professionals continue to struggle with acquiring and understanding the most relevant and useful data in order to anticipate threats, guard against attacks and determine forensically what happened after a hack occurs. However, despite such efforts, data breaches, such as those recently perpetrated against Target and the US Government’s Office of Personnel Management (OPM), are getting much bigger in scope; more damaging as to consequences; more difficult to monitor, analyze and address; and more costly to resolve.1 Is there anything on the horizon that might make security metrics more effective? The answer: Quite possibly.

What Has Changed?

Over the past five or so years, big data has burst onto the scene along with highly efficient tools for big data storage and analysis. The sources of big data are many, ranging from search data captured by the likes of Google and reported to advertisers to enable targeted marketing, to transaction data from Amazon and other major online retailers and auction sites that identify products that might interest customers, and network traffic monitored by telecommunications companies and used for billing, analysis and other applications.2

This mother lode of data, the development of high-efficiency tools, including open-source products such as software framework Apache Hadoop, and the dramatic drops in the costs of processing and storage, have led to an environment that can be exploited for many purposes—in this case, cybersecurity. Never before has so much information been available to security and risk professionals. In addition, the rapidly evolving creative and innovative benefits from this data and analysis bonanza are being realized.

Nevertheless, results from the cloud, while clearly adding considerably to resources already available to fend off cyberattacks, are by no means the whole story. There is a growing realization by big data analysts of a need for so-called “small data,” which derive from surveys and interviews of subject matter experts, alerts, reports, internal and external audits and assessments, and the like.3

Furthermore, the myriad of traditional security metrics, which this article calls “midsize data” and which are obtained from logs of network and host intrusion detection systems (IDSs) and intrusion protection systems (IPSs), applications, network and systems firewalls, and the like, must not be forgotten. Data from these sources are aggregated, correlated and analyzed by increasingly sophisticated security information and event management (SIEM) systems. Midsize data can bring further meaning to big data and may also be put into context by small data.

This article shows the synergistic effects of combining big, midsize and small data. It further suggests how one might aggregate, correlate and analyze data to fully understand the business and operational environment that an organization’s computer systems and networks support. From such an understanding emanates focused actions that should be taken to ensure a higher degree of information security.

It should be noted, however, that the collective power of big, midsize and small data does not preclude the need for incorporating value and uncertainty into the mix.4 These characteristics are obtained via small data processes, such as the Operationally Critical Threat, Asset, and Vulnerability Evaluation (OCTAVE) approach to handling risk.5

Big, Midsize and Small Data

The differentiation among big, midsize and small data appears, at first look, to be fairly straightforward. Nevertheless, complexities are introduced when combining analyses of various data sources since they frequently are not in compatible formats. Some tools operate on both structured and unstructured data and others do not. Some unstructured information can add to the analyses of structured data, such as providing business context for network traffic, whereas in other situations, unstructured data yield few benefits because they are not in formats that are mutually compatible.

There are some groundbreaking efforts being made to standardize data structures across certain big and midsize data collection programs to provide consistency and improve ease of use. A notable example is the Soltra approach created by the US financial services industry. Soltra uses open standards, including Structured Threat Information eXpression (STIX) and Trusted Automated eXchange of Indicator Information (TAXII),6 to be able to interoperate with established security tools using the same standards.7

Big Data
Data in this category are usually collected in real time in enormous volumes from web traffic and internal traffic.8 The data are then often analyzed rigorously using increasingly sophisticated tools. It is common in today’s world to run massive amounts of data through analytic engines and announce interesting relationships, often regardless of whether or not there is any predictable cause and effect.

Judging from the claims of some security product and service vendors, one might think that if only one were to gather enough data, then one should be able to anticipate all applicable threats, exploits and attacks sufficiently well in advance so as to take preventive or defensive action as the analysis of the data would suggest. Clearly this utopian view is not reality. Some considerable expertise is still needed to interpret results and put them in context. Notwithstanding their limitations, however, big data analyses contribute significantly to better understanding of security posture and environments and may well provide an edge that information security professionals have been seeking for decades, as supported by a number of publications and articles.9

Another important aspect of the big data revolution has been the proliferation of highly efficient analytical tools designed to rapidly sift through huge volumes of structured and unstructured data. These tools can run against data obtained from external sources and internally generated data or a combination of both. Whether or not predictive analysis can derive patterns to enable forecasting future events remains to be seen, but the prospect is encouraging for a profession that usually deals with attacks after the fact rather than proactively.

Midsize Data
Data collected by network and host IDSs and IPSs, network and application firewalls, and application instrumentation can run typically gigabytes, but even terabytes, per day. Yet, such data are still considered to be significantly smaller than the terabytes and petabytes typical of big data, which is why the term “midsize” is used when referring to data relating specifically to the organization and collected by internally deployed security products or third-party services. These data are typically aggregated, correlated, analyzed and displayed by SIEM systems, which also issue alerts when suspicious, unusual or unexpected activities occur.

Even though SIEM data can result in very large databases, which can be analyzed in real time, they provide only part of the picture. There is a strong, largely unmet need to generate and analyze data about activities within applications and system software. This area is, unfortunately, underserved in many organizations, as suggested by so many reports of companies, government agencies, academic institutions and others not knowing exactly who and what was affected by a data breach and when the breach might have occurred. The key here is to incorporate security data collection capabilities into software early in the development life cycle. This will largely avoid the expensive reworking of applications that postdevelopment instrumentation incurs.10

Small Data
Big and midsize data need to be supplemented with small data to better understand their meaning and context. The information value of small data is derived mainly from analysts being able to put other data analyses into perspective. By surveying or interviewing business managers, users, customers, business partners, suppliers, and other internal areas, such as legal, compliance, and marketing, one can get a better sense of what is important to each area and how its activities vary with new business, seasonal factors and the like.

As an example, network- and host-based IDSs and IPSs monitor message volumes and traffic characteristics. In the case of IPSs, the systems also respond to unusual activities and block specific messages. Often the IT and information security areas are not fully aware of business changes and might, as a result, interpret a sudden surge in traffic volume as a cyberattack or nefarious insider activities, for example, when in fact the increase might be due solely to taking on a large new customer. Similarly, a big drop in volume might be due to the loss of a major customer. The prospect of any such major business shifts needs to be conveyed to the security staff, which, in turn, need to pass on the information in advance to those responsible for IDSs, IPSs and other monitoring systems. No amount of analysis of historical data would anticipate such changes, so it makes sense to regularly communicate such changes in the form of small data inquiries and reports.

It is important to have accurate and timely reporting systems in place that inform security staff of events and potential changes that might affect any and all aspects of business operations, IT and information security. Such advisories might include system failures, lost data media, unusual activities and system upgrades.

Another example of small data collection is implementing a method for determining information security risk, such as OCTAVE.11 In this approach, all involved departments, which often means the entire enterprise, are asked very specific questions about their assessment of security risk relating to information systems. The responses are evaluated and an overall security risk posture is established.

Putting it All Together

As shown in figure 1, big, midsize and small data are collected and preliminary analyses are performed using a variety of tools relevant to each source of data.

The results can then be pooled to provide an overall view of information security threats, attacks and vulnerabilities potentially affecting an organization’s networks, systems and applications, leading to situational awareness. The organization then must respond to the possibility of attacks by patching vulnerabilities and updating security tools to protect against known attacks.

When data collection, analysis and reporting are performed in real time, systems will likely issue instantaneous alerts. Immediate responses are then usually required. However, many analyses are done in batch mode, taking weeks or even months before results are issued. Responses to these reports are tactical and strategic rather than operational, as are reactions to real-time incident reports.

It is important that analysts have a holistic view of an organization’s business functions and IT systems so that their responses do not result in unintended consequences. For example, a sudden increase in transaction volume might be due to a denial-of-service (DoS) attack or may result from taking on a big new customer. In such a case, the incident response team needs to know in advance about significant changes in the business so that they interpret changes appropriately.

Figure 2 shows the various sources of data and corresponding analyses for both batch and real time (stream) and illustrates how results might be consolidated and actions taken.

In general, big data are collected in real time, typically running into the millions of transactions per second for large organizations.12 Big data are usually analyzed in batch mode, but increasingly, tools are becoming available for real-time analysis.13 Midsize data are typically collected, analyzed, reported and acted upon in real time. Small data are usually collected and analyzed over longer periods, such as days, weeks or months.

Characteristics of Data Types

Each category of data, whether big, midsize or small, has its own particular characteristics and capabilities and produces different results. It should be noted that the field of security intelligence is very dynamic and new tools and methods are continually being introduced. In general, but clearly not in all cases, innovative big data capabilities are appearing at the fastest rate, with moderate evolution for midsize data methods and procedures, and small data showing relatively little change. While small data methods and procedures are generally well established, their rate of adoption is slower than hoped. Similarly the adoption of approaches for midsize data is not as rapid as needed, particularly when it comes to providing security information from within applications.

Figure 3 summarizes the characteristics, capabilities and information produced by data type. It is provided as guidance and is by no means comprehensive, particularly with respect to capabilities, since new uses of these data types for developing better security intelligence are being created daily.

Real-world Cases

The following is a series of example cases.14

Big Data
Long before big data as such were on the radar, telecommunications companies and ISPs had been gathering and analyzing huge amounts of data regarding activities on their networks. Shortly after the structured query language (SQL) Slammer computer worm struck in January 2003, one ISP was able to show retrospectively how the implementer of the worm had made several trial runs against a particular port before launching the main attack. In this case, the analysis was after the fact, but the goal has been to develop predictive capabilities for teasing out questionable activities from the huge amounts of collected traffic data.

A current example of using big data to provide organizations with immediate notifications of threats is the Soltra Edge initiative sponsored by the Financial Services Information Sharing and Analysis Center (FS-ISAC) and the Depository Trust and Clearing Corporation (DTCC). The goal of Soltra is to “deliver software automation and services that collect, distill and speed the transfer of threat intelligence from a myriad of sources to help safeguard against cyber attacks.”15

Midsize Data
As one company began to implement security information management (SIM) technology (SIM was later superseded, at least in name, by SIEM), its operation of, and support for, firewalls was moving from information security to network operations since the technology had become mainstream. IDSs and SIM systems were still managed by information security staff, but soon thereafter IDSs transitioned to operations. Then, when IPSs were introduced, the technology was considered too dangerous to move into day-to-day operations, since a single misstep in supporting IPSs could rapidly lead to major business catastrophes. An IPS vendor illustrated the point by giving the example of a company that implemented an IPS on a Friday evening. The system, which monitored traffic patterns in order to set up prevention rules, observed and established its baseline volume pattern on typical low-volume weekend traffic. On Monday morning, when everyone came to work, the IPS saw a sharp jump in volume and proceeded to close down new traffic, which meant all employees and connected customers.

Furthermore, early SIM systems did not have user-friendly interfaces, and interpreting events required significant effort by subject matter experts. However, it was found that it was helpful to obtain information from business units regularly so that teams expected and were prepared for major anticipated changes.

Small Data
The types of data that comprise small data and the means of collecting them are many and diverse. While some measure of automation can be invoked to assist in collecting and analyzing survey data, the quantity of data collected and the number of persons surveyed are usually small and few, respectively. While periodic reporting of security metrics may, at some level, also be automated, there is often a need for manual data entry. Consequently, there are limits to what one might ask for and the quality and quantity of the data received in response. Such data are often unstructured, although some are designated specific formats ahead of time as dictated by the products being used.

One such approach is the OCTAVE method, which was developed by Carnegie Mellon University’s (Pittsburgh, Pennsylvania, USA) Software Engineering Institute. One definition of OCTAVE is “a security framework for determining risk level and planning defenses against cyber assaults.”16 Since its initial development in 2001 for the US Department of Defense (DoD), OCTAVE has broadened its scope to include the private sector and has undergone a number of changes and enhancements. It has been published in several versions, including the most recent version, OCTAVE Allegro, which is based on OCTAVE Original and OCTAVE-S. Today, OCTAVE-S is used for smaller organizations, while OCTAVE Allegro is used for large organizations with multilevel structures.

OCTAVE defines three phases:17

  1. Build asset-based threat profiles.
  2. Identify infrastructure vulnerabilities.
  3. Develop security strategy and plans.

Clearly, being able to develop a security strategy requires more than the responses to OCTAVE questions, which is why an overall program requires additional analyses of data from other sources.

Management of an OCTAVE project for a smaller financial organization found significant benefits in being able to approach all areas of the company with a prepared set of questions. In some organizations, it may be more effective to have outside parties conduct the surveys since often internal department staff is more responsive to outside consultants.

Supplementary Information
Many organizations report security metrics, such as threats from big data sources and numbers of attempted and successful intrusions from network and system monitoring products (firewalls, IDSs, etc.). In some cases, third-party services will match threats against a company’s particular infrastructure and software and hardware products in use so that organizations are provided with information relevant to their environment. In other situations, one has to perform the matching oneself. In either case, a full, accurate and up-to-date inventory is needed. While there are automated systems that run through an organization’s systems and networks and pick up details of software installed, including version numbers to determine whether patching is current, there is usually a considerable ongoing manual effort involved in making sure that all resources have been covered.

Even when the above “scoreboards” are presented to management, auditors and regulators, they often require supplementary information without which the numbers have little meaning. For example, if a report shows that 80 percent of installations of a particular software product have been updated with the latest patch, it is still necessary to know whether the remaining 20 percent includes systems that are critical to the functioning of the organization.18 If so, there may be considerable risk exposure until all systems are patched or otherwise fixed.

If the reported security metrics do not include descriptions of their criticality and their purpose within the organization, then decision makers will not have sufficient information with respect to the value of the systems involved and the importance of making sure that the applications and the data are adequately protected.

Here is a summary of actions to be taken in order to fully benefit from the rich inventory of security-related data that is increasing rapidly as new sources and tools are discovered:

  • Determine appropriate information security metrics for decision making and action.
  • Determine the sources from which data supporting decision making and action might be extracted.
  • Socialize benefits of collecting and analyzing data and reporting and reacting to metrics.
  • Introduce policies and procedures that formalize data collection, analysis and reporting.
  • Obtain senior-level management commitment to apply sufficient resources to build the necessary capabilities for data collection, analysis, reporting and response.
  • Formalize the interactions within and among the various constituencies (i.e., information security, risk management, application development, quality assurance, operations, vendor management, internal audit, business continuity).


The proliferation of enormous data sources and advanced analytical tools has lead the world to the brink of major breakthroughs for determining threats, predicting and avoiding attacks, and detecting and responding to breaches. The full benefits of these innovations will not automatically accrue in the cybersecurity world. They require work in selecting and integrating known data and expressing results in terms that are understandable and actionable for decision makers.

This article examines the potential and suggests approaches that will help realize synergies among big, midsize and small data analyses and result in capabilities that will, for the first time, give defenders a chance to overtake the rapidly rising abilities of attackers.


1 For an authoritative account of the characteristics of recent data breaches, see Verizon, 2015 Data Breach Investigations Report,
2 According to IBM, 2.5 x 1015 bytes of data are being created daily. This number is increasing rapidly in that 90 percent of all data were created in the prior two years, as described in Bringing Big Data to the Enterprise,
3 Peysakhovich, Alex; Seth Stevens-Davidowitz; “How Not to Drown in Numbers,” Sunday Review, The New York Times, 2 May 2015,
4 Axelrod, C. Warren; “Accounting for Value and Uncertainty in Security Metrics,” ISACA Journal, vol. 6, 2008
5 Caralli, Richard A.; James F. Stevens; Lisa R. Young; William R. Wilson; Introducing OCTAVE Allegro: Improving the Information Security Risk Assessment Process, Technical Report CMU/SEI-2007-TR-012 ESC-TR-2007-012, Software Engineering Institute, May 2007,
6 Depository Trust and Clearing Corporation, “FS-ISAC and DTCC Announce Soltra, a Strategic Partnership to Improve Cyber Security Capabilities and Resilience of Critical Infrastructure Organizations Worldwide,” press release, 24 September 2014,
7 Tripwire, “Soltra Edge and Tripwire Enterprise,”
8 In the case of Hewlett-Packard, for example, the enterprise reportedly generated one trillion events per day, or about 12 million events per second in 2013, as noted in section 4.2 in CSA Big Data Working Group; Big Data Analytics for Security Intelligence, Cloud Security Alliance, September 2013,
9 CSA Big Data Working Group, Big Data Taxonomy, Cloud Security Alliance, September 2014, IBM, Extending Security Intelligence with Big Data Solutions: Leveraging Big Data Technologies to Uncover Actionable Insights into Modern, Advanced Data Threats, Thought Leadership white paper, IBM Software, January 2013,
Marko, Kurt; “Big Data: Cyber Security’s Silver Bullet? Intel Makes the Case,”, 9 November 2014, Ponemon Institute, Big Data Analytics in Cyber Defense, Ponemon Institute Research Report, Sponsored by Teradata, February 2013, Teradata, Big Data Analytics: A New Way Forward for Optimizing Cyber Defense, November 2013,
10 Axelrod, C. Warren; “Creating Data from Applications for Detecting Stealth Attacks,” STSC CrossTalk: The Journal of Defense Software Engineering, September/October 2011
11 Op cit, Caralli
12 Op cit, CSA, 2011
13 Ibid.
14 These cases are examples from the author’s personal experiences.
15 Op cit, Depository Trust and Clearing Corporation
16 TechTarget, “Definition: OCTAVE,”,
17 Ibid.
18 “Criticality” has several definitions with respect to computer systems. In regulated industries, such as financial services, systems that are needed to comply with legal and regulatory requirements are highly critical, as are those systems that are needed to maintain the continuous operation of the organization in both normal circumstances and contingency mode. Real-time systems (such as trading systems) are time-critical, since even a short outage can incur significant monetary and reputation costs and bring on the wrath of regulators. Batch systems may not be as time critical as online systems, but their compromise (failure, malfunction, loss of data integrity) may affect timely and accurate processing of data and can hugely impact the continued survival of the organization from financial and operational perspectives.

C. Warren Axelrod, Ph.D., CISM, CISSP, is a senior consultant with Delta Risk LLC, specializing in cybersecurity, risk management and business resiliency. Previously, he was the business information security officer and chief privacy officer for US Trust. He was a founding member of the Financial Services Information Sharing and Analysis Center and represented financial services cybersecurity interests in the US National Information Center during the Y2K date rollover. He testified before the US Congress in 2001 on cybersecurity. His most recent book is Engineering Safe and Secure Software Systems. Previously he published Outsourcing Information Security and was coordinating editor of Enterprise Information Security and Privacy.


Add Comments

Recent Comments

Opinions expressed in the ISACA Journal represent the views of the authors and advertisers. They may differ from policies and official statements of ISACA and from opinions endorsed by authors’ employers or the editors of the Journal. The ISACA Journal does not attest to the originality of authors’ content.