Data classification, or taxonomy, is a practice related to almost all domains of human knowledge, from mathematics to biology, marketing and finance. It directly affects how organizations understand business processes at the most elementary level (i.e., manipulation of individual pieces of information to achieve expected results), allows the assignment of economic value to intangible data and enables a structured approach for data management. Data classification is fundamental to asset management, risk assessment and the strategic use of security controls within the IT infrastructure of any organization. Without the understanding of different classes of data according to the assessed risk and value, it is impossible to allocate and maximize resources to ensure continuity of business operations.
For years, military systems have been using variations of five-level (unclassified, sensitive-but-unclassified, confidential, secret and top secret) classification scheme. This structure was later adapted to the corporate environment, using words such as private, sensitive, critical and confidential, among others. However, these words alone do not assign sufficient distinction among levels of classification and can lead to misinterpretation and confusion.
To be effective, the classification scheme should clearly articulate the association between the data and their supporting business processes. Once meaningful terminology is employed in the classification scheme, a secondary capability will naturally evolve. This capability is the mapping and expression of security characteristics such as ownership, liability and control of data. What distinguishes this from the traditional models is that the security characteristics flow directly from the business process, rather than being derived from military or other unrelated or historic criteria.
A customized approach to data classification based on business and security requirements must start with a high-level business impact analysis (BIA). It is extremely important that organizations recognize critical business information. The BIA may be performed by using assessment questionnaires or interviewing key users, or both, and will identify the business processes, structured data (e.g., database tables) and unstructured information (e.g., e-mails, reports, spreadsheets) that are most likely to impact the organization’s ability to function if disrupted, compromised or lost.
Generally, the outcomes of a BIA specify the most critical business processes for the organization, based on business operations, revenue streams or the ability to deliver a service (e.g., government functions). Once this information is known, two fundamental pieces of information can be tracked: the information systems and infrastructure components that support those business processes, and the major classes of business data that are generated and manipulated during regular business operations (e.g., financial data, strategic plans). More complex organizations or organizations operating in a cross-section of industries (e.g., retailing plus financial services or media plus entertainment) will have more data classes identified, and some classes can be broken down into subclasses (e.g., financial data can be mapped to sales, manufacturing, supply chain or others).
From this point, senior executives must be appointed with ownership over business information and report to the board or the CEO. Ownership can be tracked from revenue streams that are supported by the business information (i.e., control points can be established by mapping who profits from the use of information1) or from the delivery of mission-critical services (e.g., patient health and privacy in healthcare, public safety in law enforcement). In complex organizations, business information can be generated by one business unit and combined and manipulated by several others over its life cycle. Ultimately, ownership should be assigned to the business unit with primary control over the creation, usage and disposal of business information, even if other units also utilize it during regular operations.
For each data class discovered in the BIA exercise, business and security requirements must be identified and documented, with guidance from the information owners and other stakeholders (e.g., legal department, security, human resources). The three pieces of data (data classes, ownership and requirements) enable the structuring of business information and more efficient, effective and reliable data management.
Once business data have been properly classified and requirements have been documented, standards and procedures must be built around the requirements and enforced by the use of internal controls. The result is the gradual evolution of IT processes toward regulatory compliance and alignment with the business strategy. Figure 1 summarizes the data classification process.
Ownership of Data, Control and Liability
The Control Objectives for Information and related Technology (COBIT) framework recommends, in DS5.8, that “Management should implement procedures to ensure that all data are classified in terms of sensitivity by a formal and explicit decision by the data owner according to the data classification scheme.” Although IT is usually vested with the responsibility of maintaining different business applications, business information does not belong to IT.2 This leaves an unclear state in many organizations as to who owns business information. The complexity of business information systems and networks, coupled with ancient cultural barriers between technical and nontechnical professionals, creates issues for senior executives if they cannot fully understand how the IT infrastructure that holds critical business information works.
How can the board be assured that IT is applying the right security controls, monitoring risk and following due diligence? The role of chief security officer (CSO) was in part created to form a liaison between senior executives who are familiar with financial and operational risk and the IT staff members who control virtual private networks (VPNs), firewalls, encryption and routers. However, business information does not belong to the CSO. The fact remains that business information must belong to whoever is ultimately responsible for the business process from a revenue standpoint. Custody of supporting information systems can be delegated to IT, but not data ownership. As COBIT control objective DS5.8 states, “Owners should determine disposition and sharing of data, as well as whether and when programs and files are to be maintained, archived or deleted. Evidence of owner approval and data disposition should be maintained.”
Many other COBIT control objectives (mostly in DS5 Ensure systems security; DS11 Manage data and DS13 Manage operations) make reference to controls that must be applied, according to the assessed data sensitivity and criticality levels, and approved and audited by the data owner.
The Sarbanes-Oxley Act also requires a clear process for defining data ownership in organizations preparing to comply with section 404 requirements. In addition to defining data ownership, the organization must:
- Formalize the delegation of the safeguarding of information to data custodians in a documented process
- Continuously review and document who has access to business information
- Ensure that access controls and monitoring take into consideration the principle of segregation of duties (security and administration roles)
Security and Business Requirements for Classifying Data
When classifying data, criteria based on business an security requirements must be determined by a high-level BIA. Information owners can use the BIA outcomes to provide a clear map of key business processes and establish criteria that might consider the following:
- Access and authentication—One of the most difficult exercises when defining access requirements is to understand exactly who has a clear need to use the information during regular business operations, who needs access only for support and maintenance purposes, and finally who will periodically audit operations to prevent fraud and security incidents and detect performance anomalies. COBIT recommends that the information owner, who is ultimately accountable for whatever happens with the information, be supported by a formal approval and authorization mechanism.3 Unless unlimited anonymous access is allowed, this is a requirement expected for any business information system. Unique requirements will be identified depending on how distributed each information system is, its different user profiles, and the selected access control solution; a centralized system (e.g., a single repository controlling global access, single decision point) is desirable but not always viable from a business standpoint, while a decentralized model (e.g., local groups granting access and synchronizing with each other periodically) will require uniform approval criteria and validation procedures among the authorizing parties to ensure consistency. When building a data classification scheme, the attributes from this criterion include:
- Access approval and review process
- System architecture—Distributed/centralized
- User profile—User groups, requirements, trust level (e.g., third party vs. internal)
- Environmental risk—Integration with other networks, applications, databases and users who require access and might pose new risks
- Confidentiality—Once sensitive information has been identified, it is necessary to determine where it is stored (databases, backup systems, business continuity sites), manipulated (applications) and transmitted (network segments). Attributes from this criterion include:
- Where information exists—Stored, transmitted, manipulated
- Legal requirements for confidentiality
- Privacy—These requirements can be more complex than just providing confidentiality. In several countries, regulation dictates that when decisions are made based on information received from a database, the individual who is affected by such decisions should have the right to examine the database and correct or amend any information that is incorrect or misleading.4 Controls to provide such capability include identification and authentication of the affected user, access controls, integrity controls (to ensure that the correction or amendment is within accepted limits), auditing and logging, and a monitoring and alert system to warn the affected user that his/her information is about to be used. Attributes from this criterion include:
- Regulatory requirements concerning private information
- Conditions in which the user must be warned to review his/her own data
- Limitations on what the user can do when correcting data
- Availability—This set of requirements defines the expected uptime for that information, the recovery time objective (how long the organization can wait for recovery in case of an incident) and the recovery point objective (how much information can be sacrificed in a disaster). Ideally, all systems should be available 100 percent of the time, but availability costs money. While corporate web sites definitely require next to 100 percent uptime and redundant systems, many are not updated frequently; therefore, they might not require the same backup priority as the e-commerce transactions database. These are examples of two completely different requirements for availability. Attributes from this criterion include:
- Times when system must be available
- Desired annual uptime/downtime tolerance
- Ownership and distribution—Copyrighted information must be protected against unauthorized copy and distribution. This is not ensured by conventional controls based on confidentiality, integrity and availability.5 Watermarking and encryption technologies have not yet matured to provide satisfactory protection against piracy. Implementation of these technologies is still very expensive, and in some cases can be circumvented to avoid or distort the watermark on protected information. Depending on the security requirements concerning the distribution and possession of critical information, this should be treated as a separate class of data. Attributes from this criterion include:
- Approval and billing requirements (integration with other systems)
- Expiration of access rights
- Distribution methods
- Requirements for authenticity and copy control
- Integrity—This ensures that data must be protected from unauthorized changes. Integrity requirements might include protection of information during storage (e.g., hashing and matching web site files to avoid defacement and hijacking attacks), transit (e.g., IPSec VPNs requiring authentication headers or encapsulation security payload to ensure packet integrity),6 manipulation (e.g., system checks on inputted data to prevent fraud and ensure accuracy) and data compression during backup and transit (i.e., “lossy” or flawed data compression algorithms might cause information loss or corruption in copyrighted or critical data, such as high-precision images or encrypted files).7 Attributes from this criterion include:
- Change control requirements (approval, review, auditing)
- Need for automated monitoring and detection of unauthorized changes
- Authenticity and accuracy requirements
- Data retention—Depending on the required retention period and sensitivity of data to be maintained, the organization must also preserve specific versions of software, hardware, authentication credentials and encryption keys to ensure the ability to access stored data (both media and format readability) throughout the retention period, and to safeguard against loss due to future technology change. This might include e-mail messages from former employees; personal records; all information used during a Sarbanes-Oxley-related audit, including financial reports and financial report elements; processes affecting financial reports; and all documentation of risks (e.g., assessments and the outcomes, conclusions and follow-up activities of risk assessments).
- The Sarbanes-Oxley Act and the rules issued by the US Securities and Exchange Commission (SEC) require auditors to maintain, for seven years after the conclusion of the audit, all “records relevant to the audit or review, including workpapers and other documents that form the basis of the audit or review, and memoranda, correspondence, communications, other documents, and records (including electronic records), which (1) are created, sent or received in connection with the audit or review, and (2) contain conclusions, opinions, analyses, or financial data related to the audit or review.” In addition, pursuant to rule 12b-11(d) under the Exchange Act, an organization must keep all manually signed documents filed with or furnished to the SEC (including the certifications) for five years. Attributes from this criterion include:
- Regulatory and business requirements for data retention periods
- Specific requirements for archiving and recovering the information
- Auditability—This is related to keeping track of all access, authorizations, changes and transactions that might pose a risk to any of the previously mentioned requirements. Attributes from this criterion include:
- Retention time for logs and files
- Required level of detail of logged transactions
- Monitoring and correlation of raw data, detection of anomalies and malicious activity
- Regulatory requirements for auditing and control
A Sample Data Classification Scheme
Once high-level business processes have been identified, ownership has been assigned to business information and a set of control requirements has been established, information classes can be mapped and detailed further. The process of identifying and categorizing business information is more important than the specific control requirements shown in the example below. Different organizations will select different sets of control requirements based on their own perspective of business processes and priorities, and alignment with industry standards such as ISO 17799, the COBIT framework and others. Figure 2 shows a sample data class called service operations information and some of its control requirements.
Although only a subset of the control requirements described in the previous section was selected for clarity, a more thorough discovery exercise will identify several sets of control requirements for each information class or subclass (e.g., even if audit reports and bank statements can be part of the finance and accounting information class, each subclass has distinct regulatory requirements and should be treated accordingly).
Impact of Data Classification on IT Processes
Data classification is the cornerstone of the strategy behind several security practices and initiatives during the information life cycle. If the exercise of mapping business and security requirements is conducted properly, IT staff members will have a clear understanding of how information owners require their data to be handled during daily operations. It will then be the responsibility of the IT staff members to apply the proper security controls, such as encryption, authentication and logging. In organizations where ownership is not the decisive factor, controls are applied on an ad hoc basis, following trends guided by biased media.
The following high-level IT processes will be directly affected by different requirements from business-driven classes of data and can be related to COBIT control guidelines (see figure 3):
- Software development—For applications that handle data with special requirements, security is considered at all life cycle stages of the software development process. A detailed assessment of security risks must be conducted during the specifications stage, along with architecture and component integration reviews. This can be partly achieved by following the Systems Security Engineering Capability Maturity Model (SSE-CMM), metrics and SSE-CMM appraisal method (SSAM),9 in conjunction with threat-modeling scenarios and risk-based decisions on the software project. The documentation of requirements, specifications and architectural decisions, based on the identification of potential threats and attack scenarios (including attack trees and probable attack vectors),10 as well as formal consideration of mitigation controls on the proposed architecture and design, should be mandatory on software projects that will handle critical data.
- Incident response—Incidents can be directly tied and categorized according to the information class that was compromised, almost on a one-to-one relationship. Escalation and communication procedures depend on the impact level on business data, as do decisions on whether to involve law enforcement and media. For each data category, different incident scenarios can be elaborated; while some classes have similar response procedures, others have unique requirements (e.g., strategic information leaked to competitors or copyrighted media being accessed without authorization).11
- Access control and authentication—Where information systems have similar requirements for access controls and authentication, it may be advantageous to adopt some type of single sign-on (SSO) solution. Specifically, if users typically operate the same applications every day, these applications share the same environment, work with the same classes of data and, therefore, are subjected to the same business risks, an SSO solution might be considered. SSO solutions should be carefully analyzed before being deployed on systems that handle different classes of data, for this scenario connects data with dissimilar security requirements by a single authentication solution.
- System architecture impacts access and authentication controls (e.g., a centralized directory controlling access over a large WAN, or decentralized authentication repositories trying to synchronize and replicate over a distributed or extremely segmented network); new availability and confidentiality requirements will probably be discovered as more research is done.
- Approval cycles determine how much trust can be granted to subjects who require access to business information, and how much control and accounting are required before access is granted to different user profiles. Business processes need to be reviewed because most were not designed with security in mind. The tradeoff between security and efficiency almost always is inclined to the latter (e.g., how much information the help desk is willing to give to solve a problem, how much information a sales representative can access to close a deal, how exactly the approval process works when access to sensitive information is granted).
- Archival and recovery capabilities—For each information class and supporting systems, the desired uptime, recovery point and time objectives, and assessed impact of disrupted operations will tell how much effort and resources must be spent on archiving and redundant technologies (mirroring, storage area networks, replicated databases, transaction logging, clustering, fail-over capabilities, etc.) to support business systems.
- Disaster recovery and business continuity—The same RTO, RPO and uptime requirements established for the archival and recovery processes define which information systems must be supported by plain backups; a cold site (a facility with power and cooling where a computing system can be installed with some time and effort); a combination of a more expensive, temporary hot site (fully prepared redundant facility) to be used while the company’s cold site is set to host emergency operations; or a fully manned redundant hot site. The requirements for availability determine how ready the site must be (e.g., real-time remote update, business relocation strategy), while access controls, auditability and confidentiality say how much physical and logical controls must be in place to protect data during a crisis. An old attack stratagem is to generate an emergency condition, wait until all systems go to redundant (a usually much less secure) mode, when decision makers will be thinking in terms of availability and going back to business as soon as possible; this is the time when data are most vulnerable to unauthorized access and interference. Figure 4 shows the natural tradeoff among security requirements, desired availability and the cost of a disaster recovery and business continuity strategy. Figure 5 shows how different data classes can be mapped to security and availability requirements. This high-level map supports the establishment of strategic priorities and considerations for each data class during disaster recovery and business continuity scenarios.
- Outsourcing and third-party management—Outsourcing and third-party management must maintain a high level of diligence when working with critical information. Specific controls must be applied to ensure that access levels of third-party agents are strictly controlled and formal authorization and authentication methods are enforced. Necessary technical controls include enhancing logging and monitoring of activities, establishing separate user groups, restricting log-on times, and setting short expiration dates to all guest accounts. Administrative controls must consider details of legal agreements between consultant and client, non-disclosure agreements (NDAs), security training, and clear and formalized understanding of information ownership (especially on software development contracts). ISO 17799:2000 provides a comprehensive list of requirements to be included in contracts with third parties in section 4.2.2— Security requirements in third-party contracts.
- Auditing, event logging and monitoring—All classes of critical information must have their own requirements for protection, management and retention. Auditing capabilities must allow the data owner (or delegates) to attest that the requirements are being met and detect deviations and anomalies. The results of the BIA identify to data and business owners exactly which assets need to be monitored. Once the assets and the associated security requirements are established, IT can determine the specific information that must be collected. Following the example displayed in Figure 1, the service operations information data class was identified as critical to the business, and the organization determined a set of security and business requirements. Now business and information owners must discuss the current situation with IT and establish a set of recommended controls to be implemented and reviewed to properly monitor this information. Figure 6 shows a set of proposed security controls according to the sample requirements identified in figure 2, based on the ISO 17799:2000 standard and COBIT 3rd Edition framework.
- Physical security and zoning—Because different classes of data are often handled in the same facility, there should be a separation between physical areas (e.g., floors) that house staff and equipment being used on research and development projects, HR and payment systems, or financial systems. The same applies to printers, photocopiers and fax machines used by different sectors with distinct security requirements. Whenever such equipment is shared between high- and low-security zones, security is jeopardized, as no authentication is required to read a paper report found in the printer’s tray. Other scenarios that may jeopardize security are shared public areas, meeting rooms with network jacks, shared paper recycle bins, third parties working without supervision in sensitive areas, and laptops used at an employee’s home during weekends and at the organization’s premises on working hours.17
The fundamental objective in classifying and protecting data should be based on the reasons why the data are important to the business in the first place. IT does not have this knowledge in most cases. Therefore, each line of business must identify which pieces of information are critical to its business processes. From a security standpoint, it does not really matter what names are assigned to data as long as data sets are established that provide more meaning to business operations. The advantage of employing classification terminology that reflects the business operation will be evident in communicating ownership and integrating the security requirements into each business process.
The key success factor in a data classification scheme is that classes of information are properly defined and related to process owners, easily communicated to all stakeholders, and clearly convey a business value to the organization, while expressing the need for hard, technical internal controls that IT understands.
1 Wylder, John; Strategic Information Security, Auerbach, 2003
2 Additional references to the differences between information ownership and stewardship can be found in the Information Security Management Handbook, Fifth Edition, by Tiptop and Krause, and in Information Classification: a Corporate Implementation Guide, by Jim Appleyard.
3 COBIT, DS5 Ensure systems security, control objective 8, control practice 5
4 Freeman, Edward H.; “When Technology and Privacy Collide,” Information Security Management Handbook
5 Schwartau, Winn; “Mad as Hell IV: Security Basics for Ma & Pa,” http://searchsecurity.techtarget.com/columnItem/0,294698,sid14_gci1096261,00.html
6 Foundstone and Microsoft, “Using Microsoft Windows IPSec to Help Secure an Internal Corporate Network Server,” www.foundstone.com/resources/whitepapers/Foundstone_IPSec_W2K_XP.pdf
7 Ladino, Jeffrey N.; “Data Compression Algorithms,” www.ccs.neu.edu/groups/honors-program/freshsem/19951996/jnl22/jeff.html
8 HP Laboratories Palo Alto, “Self-Aware Services: Using Bayesian Networks for Detecting Anomalies in Internetbased Services,” www.hpl.hp.com/techreports/2001/HPL-2001-23R1.pdf
9 SSE-CMM, www.sse-cmm.org/index.html
10 Schneier, Bruce; “Attack Trees—Modeling Security Threats,” www.schneier.com/paper-attacktrees-ddj-ft.html
11 NIST SP800-61 offers a method for determining incident criticality. See Computer Security Incident Handling Guide, http://csrc.nist.gov/publications/nistpubs/800-61/sp800-61.pdf.
12 Encryption can be enforced on stored (including backups) and transmitted information.
13 All access to the information (except public) must be denied by default and provided on a need-to-know basis upon formal authorization from the information owner.
14 The recovery point objective (RPO) is the time (relative to an event compromising data) in which data must be restored, without affecting business operations (e.g., overnight backups ensure an RPO of only 24 hours. If the organization cannot afford to lose 24 hours of data, another backup solution must be implemented to comply with the RPO).
15 The recovery time objective (RTO) is the time period after an event compromising data by which business functions need to be restored. Different business functions may have different RTOs (e.g., the RTO for the payroll function may be two weeks, whereas the RTO for sales order processing may be two days).
16 PCI guidelines for protection of cardholder information
17 ISO/IEC 17799:2000, section 7—“Physical and Environmental Security”
Rafael Etges, CISA, CISSP
is a senior information security advisor at Assurent Secure Technologies (www.assurent.com), a leading information security engineering and consulting group. He has worked on governance and risk assessments as a consultant for major telecommunications and financial groups in Canada and Latin America.
is a senior member of the delivery team at Assurent Secure Technologies and plays a key role in overseeing technical security assessment and risk assessment initiatives in global enterprise environments.
Information Systems Control Journal, formerly the IS Audit & Control Journal, is published by the ISACA. Membership in the association, a voluntary organization of persons interested in information systems (IS) auditing, control and security, entitles one to receive an annual subscription to the Information Systems Control Journal.
Opinions expressed in the Information Systems Control Journal represent the views of the authors and advertisers. They may differ from policies and official statements of the Information Systems Audit and Control Association and/or the IT Governance Institute® and their committees, and from opinions endorsed by authors' employers, or the editors of this Journal. Information Systems Control Journal does not attest to the originality of authors' content.
Instructors are permitted to photocopy isolated articles for noncommercial classroom use without fee. For other copying, reprint or republication, permission must be obtained in writing from the association. Where necessary, permission is granted by the copyright owners for those registered with the Copyright Clearance Center (CCC), 27 Congress St., Salem, Mass. 01970, to photocopy articles owned by the Information Systems Audit and Control Association Inc., for a flat fee of US $2.50 per article plus 25¢ per page. Send payment to the CCC stating the ISSN (1526-7407), date, volume, and first and last page number of each article. Copying for other than personal use or internal reference, or of articles or columns not owned by the association without express permission of the association or the copyright owner is expressly prohibited.