Data warehouses and their related reported capabilities deliver immense strategic value. Data warehouses amass and organize vast volumes of detailed information from systems throughout the company. These massive data sets support data mining analytics that generate insightful reports for evaluating market segments, productivity, profits, sales and other elements critical to organizational success. This capability enables managers to identify emerging opportunities or underlying causes for concern.
An international retailer, for example, operates thousands of stores, and each store stocks thousands of different items. Cultural factors affect overall sales at various locations, as well as sales of individual items. A variety of other factors, including climate, local customer demographics, and nearby pedestrian and vehicle traffic, also influence sales.
Data warehouses capture enormous amounts of sales data from all of these locations. Using data mining tools that gather and analyze the most relevant information, data warehouses can generate concise reports and graphs that depict the overall sales trends and patterns within different countries, regions or cities; merchandize items popular at all stores; and allow for differing sales characteristics among diverse locations.
These reports drive decisions in marketing, purchasing, distribution and other functions, and enable the retailer to effectively respond to local customer preferences while retaining strong brand identity and enhancing overall efficiency and company performance.
Reliance on such critical high-level reports, however, also highlights the importance of evaluating and monitoring data warehouses themselves. The best data mining and analytical functions are useless without accurate, reliable underlying data. Data warehouses contain critical and proprietary information. Acute potential risks accompany vulnerabilities to data manipulation or unauthorized access. Data warehouses need vast amounts of relevant data to produce valid reports. That requires continually replacing outdated data with newer information. Any changes in upstream data sources or the data warehouse itself require thorough analysis, testing and documentation.
Recognizing the fundamental characteristics of an effective data warehouse helps companies understand aspects that require evaluation during its development and ongoing operations.
Data Warehouse Characteristics
Data warehouses produce information for decision-making purposes, rather than for executing operational processes. They combine data from multiple upstream systems, which enables managers to standardize data structures and provides a means for analysis across multiple systems.
Data warehouse reports typically take a companywide view and address issues that affect the entire organization. While a data mart—a data warehouse with a narrower scope—focuses on a particular department, business unit or market segment, its reports take a much broader perspective than those produced by an operational application or database.
Data mining tools for extracting and analyzing supporting information consume considerable processing power. Locating the data warehouse on a separate server lets managers generate reports without straining the IT systems used for operational purposes.
Segregating the data warehouse serves other purposes, too. Data warehouses provide a centralized location for storing historical data. This frees storage capacity on production servers and lets managers archive valuable data that reside on legacy systems or hardware scheduled for retirement. Maintaining the data warehouse on a separate system also enables auditors to more clearly define their evaluation scopes.
Data warehouses store historical information in read-only format. Since the data are used for analytical functions, they do not need to be transformed to support live operational processes. The read-only format also protects crucial information from data manipulation.
Data warehouses deploy “extract, transform, load” (ETL) technology to harvest data from upstream systems. This involves data mapping, as well as scrubbing for inaccurate, repetitive or inconsistent data entries. While data warehouses require immense data sets to generate useful reports, information continually becomes obsolete and loses its importance as newer data enter the warehouse. To reduce IT storage costs and sustain optimal performance, secure disposal policies provide direction for removing older, unnecessary data.
Throughout the data warehouse’s operations, defined user provisions and restrictions assure that various levels of data access and reporting capabilities remain limited to individuals whose organizational responsibilities require such access.
Though smaller in scope, data marts merit the same audit considerations during development. Such examination increases the likelihood that they will perform effectively, efficiently and securely.
Auditing Data Warehouse Development
Proper development provides a foundation for effective data warehouse operations and reduces future potential risk. Development audits should address data warehouse planning, architecture, implementation, operations and creation of any data marts.
During planning, senior management support provides strong evidence of sufficient resources and high priority for data warehouse development. With any IT system, operational difficulties and potential risks arise when actual uses and users do not align with original intentions. Participation in planning from all departments that supply data and expect to benefit from the data warehouse’s reporting capabilities helps mitigate such an audit concern.
Project management methodology imposes discipline and documentation on development efforts. Such methodology reduces risks associated with ad hoc planning and undocumented activity. A development audit evaluates whether planning incorporates sound methodology.
System architecture concerns address whether to develop a two-layer or three-layer data warehouse. A two-layer architecture encompasses the upstream data sources as well as derived, read-only historical data. The three-layer design includes an additional data reconciliation layer for scrubbing data and resolving inconsistencies that arise when various systems contribute identical data entered in different formats. Such inconsistencies often arise between legacy and newer systems used for the same operational purposes.
Development audits examine whether the proposed architecture is appropriate for the intended purpose. For a small organization or a business unit developing a data mart, assimilating data from disparate systems or resolving inconsistencies that arise from different data sources may not be crucial issues. Two-layer architecture may work fine for such entities.
The data integrity risks associated with that architecture, however, compound for much larger businesses seeking enterprise-level reports based on data from a diverse range of systems and sources. The additional layer for scrubbing and reconciliation is necessary, with test reports evaluated to ensure that data conversions take place as expected.
Development audits also evaluate the data warehouse’s intended online analytical processing (OLAP) capabilities. These capabilities drive data mining, querying and reporting functions, and enable managers to review charts, graphs, tables and other visual summations of company performance. Proposed OLAP capabilities must meet reporting expectations and accommodate the data warehouse’s scope and capacity.
Data warehouse development evaluations address current data storage and processing requirements as well as anticipated future needs. Audits also assess needs for data mirroring and other means for recovering data following a disaster.
Development audits require examining proposed security measures, too. Planned access and reporting privileges must correspond to defined individual work roles, responsibilities and user provisions. Safely storing and retrieving critical, confidential information require establishing appropriate data encryption procedures. Applying read-only restriction to historical information prevents data manipulation. Additional security policies ensure that such information receives secure and proper disposal once it outlives its purpose.
Auditing an Existing Data Warehouse
While auditing during development reduces the potential for difficulties, all data warehouses require periodic auditing. Such auditing evaluates adherence to existing policies and controls, and examines the effectiveness of these policies and controls. This examination illuminates vulnerabilities requiring mitigation. Audit areas include security, data integrity, change management and testing of reports.
Data-security auditing begins with identifying and classifying the types of information in the data warehouse. Data warehouses may hold sensitive corporate financial data, Social Security numbers, national ID numbers, bank account numbers, other personal information for customers or employees, or patient treatment records for healthcare providers. In addition to evaluating overall security measures, compliance with laws and regulations such as the US Health Insurance Portability and Accountability Act, US Gramm-Leach-Bliley Act, US Sarbanes-Oxley Act of 2002, Basel II and other similar standards may require particular controls for specific types of data.
Data classification and their level of significance determine which individuals require access and their appropriate levels of access, based on work-related duties and responsibilities. A particular business unit, function, department, location or segment of customers defines access parameters. Access restrictions may also be based on rows, columns or tables for significant data sets. Similar considerations apply to setting report-function limitations.
Upstream systems supply information to the data warehouse, and these data transfers are crucial data integrity concerns. Software systems and data warehouses developed by the same vendor, such as SAP or PeopleSoft, incorporate tighter data integration and reduce opportunities for errors or inconsistencies to arise.
A custom data warehouse that consolidates data from a range of software system sources, however, presents much greater potential for errors or data inconsistencies. Two disparate systems, for example, may deploy differing state abbreviations—for example, based on US Postal Service or the Associated Press styles—for address fields. Parts codes supplied by two systems may use slightly different alphanumeric sequences for those field entries. Legacy system limitations may restrict lengths of other entries in upstream sources.
Such situations call attention to the importance of evaluating the data mapping processes and mapping engine used to transfer and scrub data. Critical issues include whether data mapping occurs before or after data enter the warehouse, what restrictions and maintenance apply to mappings, whether or not data are encrypted before they are sent to the warehouse, and management of the data engine.
IT elements exist in dynamic environments subject to continual change, and change management practices must address data warehouse operations. Changes in upstream systems affect data and transfers to the data warehouse, and must be acknowledged. The data construct within the data warehouse must be documented so that changes to mappings or tables are properly understood. Established policies restricting modifications of reports ensure that users receive valid, accurate assessments.
A data warehouse exists to generate reports, and audits must identify the most critical reports and the most crucial report attributes. A parallel testing environment enables auditors to test how critical reports behave under various conditions, and whether or not they generate the expected results.
When performed regularly, such audits ensure that data warehouses remain secure, deliver accurate reports and add continual value to organizational decision making.
Technological advances enable data warehouses to continually store greater amounts of information. Based on such immense volumes of data, data mining tools let managers generate reports that provide more accurate, more expansive assessments of organizational performance. These reports enhance decision-making skills, benefiting the entire organization.
Understanding the fundamental characteristics of a data warehouse and auditing its development apply a proactive approach toward addressing potential difficulties that may arise once the system has been implemented.
Continually auditing existing data warehouses highlights vulnerabilities requiring mitigation, as well as opportunities for improving the operation. Auditor attention also emphasizes the importance of regularly heeding established policies and processes for data warehouse use.
With such monitoring and oversight, organizations realize the benefits of data warehouse reporting capabilities and the enhanced performance potential that accompanies continual technological advancement.
Levine, M.; Performing a Data Warehouse Audit, Audit Serve Inc., 2006
Browder Edwards, K.; G. Lumpkin.; Security and the Data Warehouse; An Oracle White Paper, Oracle Corp., April 2005
Keay, E.; J. Tedesco.; “Keeping Your Data Warehouse Data Accurate,” DM Direct Newsletter, 7 January 2005
Gonzales, M.; The Data Quality Audit, Intelligent Enterprise, July 2004
Brink, D.; Auditing the Development of a Data Warehouse, ISACA Hong Kong Chapter, 24 January 2001, www.isaca.org.hk/document/cisa_slide/Auditing%20Data%20 Warehouse/
Warigon, S.; “Data Warehouse Control and Security,” Association of College and University Auditors LEDGER, vol. 41, no. 2, April 1997, p. 3-7
Miszczyk, J.; N. Harris.; P. Kocinski; J. Stice; K. Unger; DB2/400: Mastering Data Warehousing Functions, IBM Corporation, International Technical Support Organization, September 1998
Brian Thomas, CISA, CISSP
is the IT advisory partner in the Houston, Texas, USA, office of Weaver and Tidwell LLP (www.weaverandtidwell.com). Thomas can be contacted at +1.713.800.1050.
Information Systems Control Journal, formerly the IS Audit & Control Journal, is published by the ISACA. Membership in the association, a voluntary organization of persons interested in information systems (IS) auditing, control and security, entitles one to receive an annual subscription to the Information Systems Control Journal.
Opinions expressed in the Information Systems Control Journal represent the views of the authors and advertisers. They may differ from policies and official statements of the Information Systems Audit and Control Association and/or the IT Governance Institute® and their committees, and from opinions endorsed by authors' employers, or the editors of this Journal. Information Systems Control Journal does not attest to the originality of authors' content.
Instructors are permitted to photocopy isolated articles for noncommercial classroom use without fee. For other copying, reprint or republication, permission must be obtained in writing from the association. Where necessary, permission is granted by the copyright owners for those registered with the Copyright Clearance Center (CCC), 27 Congress St., Salem, Mass. 01970, to photocopy articles owned by the Information Systems Audit and Control Association Inc., for a flat fee of US $2.50 per article plus 25¢ per page. Send payment to the CCC stating the ISSN (1526-7407), date, volume, and first and last page number of each article. Copying for other than personal use or internal reference, or of articles or columns not owned by the association without express permission of the association or the copyright owner is expressly prohibited.