Data warehouses are playing an increasingly significant role in the financial reporting, marketing, operation management and executive decision making of today’s business world. Data warehouses are designed to be a onestop centralized business intelligence facility where corporate data standards are enforced. Through friendly user interfaces, business professionals utilize data warehouses as a data mining tool to provide answers to what-if analyses and generate subject-oriented reports. Companies have invested heavily in data warehouses to take advantage of their capabilities to drill up, drill down and drill across massive amounts of data and generate summary reports at the enterprise level. Yet, a lot of IT auditors tend to neglect the singularities of data warehouses due to a lack of expertise and awareness. Some even go to the extreme of excluding data warehouses from the normal scope of database audits.
This article intends to share some knowledge of the special strengths of data warehouse technology to inform IT auditors and controllers of the unique areas in data warehouse control.
Data Warehouse vs. Online Transaction Processing
Regular business transaction processing systems are designed to process single item transactions. In contrast to online transaction processing (OLTP) systems, data warehouses store enormous volumes of static historical data to enable summary-level analysis. Large amounts of computing resources would be involved in a regular data warehouse transaction. For example, to find out the bestselling product in the past six months, a search of all sales records in the last six months would be needed. Such an operation would require the use of more intensive computing resources than any single item transaction. As a consequence, schema designs, data storage management and computer performance management of data warehouses differ drastically from that of OLTP systems (figure 1).
Compared to a regular OLTP system, data warehouses enjoy special advantages in scalability, availability and manageability. A data warehouse “can handle very large data sets using parallel execution and partitioning, as well as offering support for advanced hardware and clustering.”1 These technologies also reduce the time that data are unavailable and ease the complexity and efforts in database management. Due to the differentiating strengths between data warehouse and transaction processing systems, organizations have abandoned the practice of mixing both kinds of databases within one single environment.
Special Considerations for Data Warehouse Control
The unique features and special advantages of data warehouses require special controls to ensure efficient operation and management of this powerful business intelligence tool.
Business users’ involvement in the requirement specification and testing should be emphasized throughout the software development life cycle of any data warehousing project. Like all IT projects, the success of a data warehouse depends primarily on the soundness of its logical design and its ability to meet the business requirements. Therefore, schema designs of data warehouses should be specified and reviewed by business users to meet the reporting needs of the enterprise. IT professionals and business users need to coordinate very closely in defining the business requirements of data warehouses. Executive support is often desired to provide funding and managerial guidance for efficient coordination.
In addition to participating in logical designs, business users of a data warehouse should provide their share of input on the physical configurations for the system to achieve the best performance. Partitioning enables in-database data mining that does not require data movement between the database and an external mining server, thereby eliminating redundancy, improving efficient data storage and processing, and maintaining data security. Business user involvement in designing partitions of tables and indices by time, geographical locations and subject areas should be encouraged in the beginning stage of all data warehousing projects. Auditors should pay attention to whether controls have been designed so that data warehouses are configured properly and tuned periodically to ensure system availability and fast execution of user requests.
Because data warehouse queries tend to consume vast amounts of computing resources, controls should exist so that ad hoc queries from data warehouse users are analyzed before being submitted for processing. Analyzing the query would inform the user how much resources would be needed, how long the query is estimated to run and what would be the optimal route to run the query. Such knowledge would prevent the user from taking over large amounts of resources by accident and enable the user to schedule a long running job at less busy hours of a working day.
Given the static nature and the large volume of data in data warehouses, computer operations management needs to be geared to meet the special needs of data warehouses. Because a predominant amount of historical data is static, it would not make much sense to perform full database backup for data warehouses as frequently as one would for transaction processing databases. It is of greater value to back up the physical and logical schema of the data warehouse than to back up both data and the database structures. Because data warehouse jobs tend to take a long time to complete, e-mail alerts of job status are valuable. It is also recommended for database administrators to perform a daily review of jobs to detect and diagnose long-running jobs.
Since input data of data warehouses are collected from multiple sources, a major part of the data warehouse work is dedicated to maintaining the data accuracy and completeness in the Extract, Transform and Load (ETL) process. Periodic reconciliation should be conducted between the source data and the data in the data warehouse. Comparing total record counts, maximum and minimum values, averages, and medians between the original data sources and the destinations in the data warehouses are some effective approaches to ensure the accuracy and completeness of the ETL process. Some large enterprises have designated teams to perform data cleansing and data quality control for data warehouses. Their work can be a good point of reference for auditors in assessing the controls in data warehouse ETL processes.
Data warehouse projects often require a hefty investment of time and money. Business management depends on the knowledge that data warehouses offer to make crucial decisions on many critical aspects of the business organization. The idiosyncrasies of data warehouses, in their purpose, design, operation and data requirements, call for special considerations beyond the regular database controls. These special considerations cover nearly all aspects of the development life cycle of a data warehousing project, ranging from user involvement, executive support, close coordination between IT and business users, sound logical and physical design, to efficient backup and job management. IT auditors and business management would benefit from understanding the special controls in data warehouses to effectively identify issues and provide value to enterprises that run data warehouses.
Kimball, Ralph; Margy Ross; The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling, 2002
Dodge, Gary; Tim Gorman; Essential Oracle8i Data Warehousing, 2000
1 Oracle Online Documentation, Data Warehouse Guide, Oracle Corp.
Ying Shi, CISA, OCP
is a manager at the Chicago, Illinois, USA, office of KPMG IT Advisory. Prior to joining KPMG in 2005, he worked at Octagon Research solutions, TargetRx and Merck-Medco. In addition to database administration and development, Shi has experience in many other IT functions, including process analysis, documentation, software testing design and project management.
Information Systems Control Journal, formerly the IS Audit & Control Journal, is published by the ISACA. Membership in the association, a voluntary organization of persons interested in information systems (IS) auditing, control and security, entitles one to receive an annual subscription to the Information Systems Control Journal.
Opinions expressed in the Information Systems Control Journal represent the views of the authors and advertisers. They may differ from policies and official statements of the Information Systems Audit and Control Association and/or the IT Governance Institute® and their committees, and from opinions endorsed by authors' employers, or the editors of this Journal. Information Systems Control Journal does not attest to the originality of authors' content.
Instructors are permitted to photocopy isolated articles for noncommercial classroom use without fee. For other copying, reprint or republication, permission must be obtained in writing from the association. Where necessary, permission is granted by the copyright owners for those registered with the Copyright Clearance Center (CCC), 27 Congress St., Salem, Mass. 01970, to photocopy articles owned by the Information Systems Audit and Control Association Inc., for a flat fee of US $2.50 per article plus 25¢ per page. Send payment to the CCC stating the ISSN (1526-7407), date, volume, and first and last page number of each article. Copying for other than personal use or internal reference, or of articles or columns not owned by the association without express permission of the association or the copyright owner is expressly prohibited.