What Every IT Auditor Should Know About Transforming Data for CAATs 

 
Download Article Article in Digital Form

For several decades now, advances in computers and information technology have led to the ubiquitous employment of computers in the business community, whether the entity is small or large. That growth led to a concomitant increase in the amount of data entities have and keep, due to the significant decline in the cost of storage. Today, the outcome is referred to as “big data.” Considered a hot topic in IT, big data is also the result of collecting nonfinancial data along with financial data. Everything from network logs to industry and economic data is being collected. These facts drive the need for experts in using computer assisted audit tools/techniques (CAATs).

Another consideration is the number of paperless transactions that occur. These days, many transactions do not necessarily have any paper at the point of sale (e.g., e-commerce sales). How can the primary source document (i.e., data) for such a transaction be evaluated without a CAAT?

None of this is new to the average business person, but there is something many may not realize. Big data is going to get bigger—much bigger—and faster. According to the Berkeley School of Management, more data have been created in the three years 2009-2011 than in the whole of human history.1 , 2 According to one expert, in 2005, there were 130 exabytes of data in the digital universe. By 2010, it was 1,227 exabytes—a 10-fold increase. By 2015, it is predicted that the digital universe will contain 7,910 exabytes.3 In addition, from 2011 to 2020, data being managed by IT professionals are expected to increase 50 times, with the number of IT professionals only increasing 1.5 times.4

Combining these facts, there is an inescapable conclusion about auditing or analyzing data, whether financial or operational objectives: Expertise in CAAT-like techniques and tools is needed to examine big data. More than ever, entities are in need of IT auditors who can properly extract and analyze mounds of data and turn them into useful information. That requires an effective methodology and an effectual set of tools.

The data warehouse segment of IT has developed a sound methodology to do just that by using extract-transform-load (ETL) techniques and tools to get various source data sets into a single warehouse, where business analytics tools are used to analyze the huge amounts of data amassed in the warehouse. This methodology should be equally beneficial to IT auditors using CAATs and big data.

Using ETL Methodology for CAATS and Data Mining

The ETL approach can be applied to CAATs and data. Most experts agree that the most difficult part of using CAATs is the data extraction phase. In ISACA Journal, volume 6, 2010, this column addressed the issues of extract. In summary, the primary issue was getting data from operational computers, generally online transaction processing systems, into a form and format that are compatible with the CAAT. Often, data are exported into a format that is not immediately suitable for use in the CAAT. The ideal format was presented in the previous article: a flat file with column headings. Various potential extraction formats were discussed in that article, as were preferences based on suitability of the resulting format. The need to verify the data integrity of the extracted data, compared to the original data, was also discussed.

There are three major types of issues that cause extracted data to need to be corrected or cleaned after they are extracted:

  1. Formatting issues related to the way data are formatted in a particular extraction approach. For example, the export could be in a report format (e.g., a digital PDF report) and have a lot of extraneous lines of data (e.g., headings, subtotals). In addition, some reports take a single transaction and list the information on two lines.
  2. The various idiosyncrasies in the way that specific data values are presented and/or formatted. For example, a negative number could have a negative sign in front of or behind the figure, or negative numbers could be put in parentheses. Sometimes data have leading zeros or spaces or trailing zeros or spaces.
  3. When data are not optimal for CAAT commands and procedures. Almost all CAATs have procedures (commands) that are dependent on certain columns being defined as numeric, character or date data. An improperly defined column results in an inability to run certain procedures from certain CAATs. There are other constraints about data mining that require the data to be a certain way.

Figure 1 provides a list of the second type, “messy data” scenarios, and figure 2 illustrates situations of the third type.

Figure 1
Figure 2

These situations create a need to clean the data—similar to cleaning data going into a data warehouse. This article refers to this intermediate process as transforming, using the same terminology as the data warehouse ETL methodology.

Tools for Transforming

The key to the transform process is understanding what needs to be transformed and having a suitable tool to perform a specific transform procedure. Whether it be formatting, cleaning messy data or optimizing data for CAAT procedures, the IT auditor needs a tool that makes the transform process as easy as possible.

Some CAATs provide specific commands related to the problems described in figures 1 and 2. In fact, a tool that is good at transforming data could be cost-effective even if a different CAAT is being used to perform the procedures after the data are transformed and loaded into the CAAT. Microsoft Excel can do most of these transforming procedures, and using macros, it is likely that all of them can be done in Excel. However, CAATs exist for most, if not all, of these transform needs, and are much easier and more reliable than using macros in Excel.

Conclusion

Efficiency and effectiveness in data mining and data analytics are highly dependent on reliable, clean data provided to the CAAT. A useful approach is the data warehouse ETL process. An earlier article addressed the process of extracting data (volume 6, 2010). This article addresses the transform process. In a data warehouse, the transform process is usually considered the most time-consuming; the same is true in using CAATs. The good news is once the extract and transform steps are properly performed, the load part, to get the data into the CAAT, is usually a simple process.

The first major point regarding transform is that, for it to be effective, the IT auditor needs to understand the procedures and commands of the CAAT in order to determine what needs to be addressed or changed in the data for those commands to execute properly. The second point is to understand the need for a sound methodology or approach to the transform process. Finally, the IT auditor needs an effectual tool that can perform most, if not all, of the three types of transform issues discussed.

Endnotes

1 LightBound, www.iquest.net/data-center/scale-computing-sannas.aspx
2 Industry Perspectives, “Three V’s of Big Data: Volume, Velocity, Variety,” 8 March 2012, www.datacenterknowledge.com/archives/2012/03/08/three-vs-of-big-data-volume-velocity-variety/
3 Gantz, John; David Reinsel; “The Digital Universe in 2020: Big Data, Bigger Digital Shadows, and Biggest Growth in the Far East,” IDC, December 2012, http://idcdocserv.com/1414
4 EMC2, “Extracting Value From Chaos,” www.emc.com/leadership/programs/digital-universe.htm

Tommie Singleton, CISA, CGEIT, CPA, is the director of consulting for Carr Riggs & Ingram, a large regional public accounting firm. His duties involve forensic accounting, business valuation, IT assurance and service organization control engagements. Singleton is responsible for recruiting, training, research, support and quality control for those services and the staff that perform them. He is also a former academic, having taught at several universities from 1991 to 2012. Singleton has published numerous articles, coauthored books and made many presentations on IT auditing and fraud.


Enjoying this article? To read the most current ISACA Journal articles, become a member or subscribe to the Journal.

The ISACA Journal is published by ISACA. Membership in the association, a voluntary organization serving IT governance professionals, entitles one to receive an annual subscription to the ISACA Journal.

Opinions expressed in the ISACA Journal represent the views of the authors and advertisers. They may differ from policies and official statements of ISACA and/or the IT Governance Institute and their committees, and from opinions endorsed by authors’ employers, or the editors of this Journal. ISACA Journal does not attest to the originality of authors’ content.

© 2013 ISACA. All rights reserved.

Instructors are permitted to photocopy isolated articles for noncommercial classroom use without fee. For other copying, reprint or republication, permission must be obtained in writing from the association. Where necessary, permission is granted by the copyright owners for those registered with the Copyright Clearance Center (CCC), 27 Congress St., Salem, MA 01970, to photocopy articles owned by ISACA, for a flat fee of US $2.50 per article plus 25¢ per page. Send payment to the CCC stating the ISSN (1526-7407), date, volume, and first and last page number of each article. Copying for other than personal use or internal reference, or of articles or columns not owned by the association without express permission of the association or the copyright owner is expressly prohibited.