AI Risk and Mitigation: Tips and Tricks for Auditing in the AI Era

AI Risk and Mitigation Tips and Tricks for Auditing in the AI Era
Author: Spiros Alexiou, Ph.D., CISA, CSX-F, CIA
Date Published: 26 June 2024
Read Time: 22 minutes

There is no doubt that artificial intelligence (AI) is in fashion right now. AI is advancing at a rapid pace, and as it matures it is quickly sweeping the business realm. Enterprises are looking to leverage AI’s potential to cut costs and improve performance. With AI adoption come high expectations, however, some of which are concrete and others more vague. “What about the risk of AI?” is a disturbing question that continues to be raised, and auditing is at the forefront of the effort to answer it. National and supranational authorities worry about possible biases, ethical issues, and regulations, such as the directives that are being prepared in the European Union (EU).1 Another concern is that AI can and does make mistakes. Inadequately controlled AI can make mistakes that are high impact, yet subtle in their appearance. This can result in significant damage, especially if a human review is considered difficult or impossible. As such, it is worth exploring these risk factors and some applicable guidance.

Ultimate responsibility for AI’s decisions—and its mistakes—must rest with humans, especially in the absence of regulation and relevant professional standards.

Tips for Effective AI Auditing

  1. Do not expect regulation to provide all the answers.2 Based on what has been observed so far, regulation is likely to add significant overhead (although less costly proposals exist3 ), but it is unlikely to make a large contribution in terms of addressing risk. Considerable attention is paid to ethical AI, which is essentially the concept that the use of AI should not lead to discrimination, although data analysis, with or without AI, hasong resulted in discrimination. For example, car insurance costs are often differentiated according to the driver’s age, gender, and marital status.4 Ultimate responsibility for AI’s decisions—and its mistakes—must rest with humans, especially in the absence of regulation and relevant professional standards. A general rule of thumb is the so-called Hand rule: Learned Hand, a US judge, is often associated with the legal school of thought that the burden on a manufacturer should be no smaller than the risk of an adverse event (which, of course, is an estimate).5 This is reminiscent of the legal thinking applied to other regulations, such as the EU General Data Protection Regulation (GDPR).6 Although it does not address the problem in full, it is a useful guideline for limiting exposure. External input, such as the US Federal Trade Commission (FTC) guidelines, offers some guidance, even though the language may be vague (e.g., concerning fairness, it may not be easy to achieve consensus, even among the developers themselves). For example, the guidelines warn that enterprises are responsible for their AI: “Hold yourself accountable—or be ready for the FTC to do it for you.”7
  2. Know the business. This is a must for all enterprises. AI is a tool, and anyone using a tool must understand why and for what it is being used. Tools are meant to serve organizational objectives, not the other way around. Using AI just for the sake of using AI is the wrong way to go about it. A much sounder approach is to start by identifying a problem and then consider whether AI might present a solution. It is critical to understand what one hopes to gain from AI and how that will be accomplished.

    A closely connected topic is ethical and legal compliance. Long lists of awful uses of AI have been compiled,8 and in almost all cases, ethical and legal issues arise because these uses do not serve any legitimate business purpose. For example, AI has been used to predict criminal behavior based on facial characteristics, whereas criminal behavior is defined as the breaking of a law and is unrelated in any conceivable causal way to facial features. Furthermore, the cost of false positives is likely to be prohibitively high. Hence, statistical inferences based purely on the prediction of some model, whether AI or not, without a causal connection, should be viewed with skepticism. For example, in the case of car insurance, the factor determining the risk (whether determined by AI or non-AI means) of any particular driver is, of course, risky or safe driving, but because insurance companies typically do not have this information, they try to assess risk based on information they do have, such as age, gender, and marital status. These factors may have an indirect, causal, and statistical correlation with the deciding factor: driving behavior. Such statistical correlations may invalidate individual beliefs, such as belief in the concept of equality. It is contrary to antidiscrimination principles to claim that, based on gender, age, or some other factor, a certain group is statistically more reckless, yet this may be supported by the data. Can these results be used, and if so, how? Again, laws and regulations.

    Similarly, if AI predicts that someone’s facial profile is a 90% match to that of a criminal, and that person is jailed wrongly, it is not an AI shortcoming; The problem lies with its use. Conversely, if such a match were used in an exploratory manner to shortlist possible matches, and confirmatory robust tests were run on this smaller dataset, it would represent a far more judicious application of AI. Thus, the all-important question is: What will AI be used for and what is the cost-benefit relationship?
  3. Understand that AI is not perfect. AI makes mistakes, and with more complexity, the chance of error increases. These mistakes have consequences, and these consequences have a cost. However, not every use of AI has the same consequences in terms of cost. A 99% accuracy rating in one application may be inadequate, while a 60% rating in another may be perfectly satisfactory. Managing these errors and associated costs is crucial. Does a human check the AI for errors, glaring or not? Is there another system in place, AI or not? Can the risk of such errors be accepted? These are crucial cost-benefit questions that must be answered. For example, a mistake in an AI application that suggests potentially interesting movies for a viewer does not have the same consequences as an AI text analysis that misunderstands context and is used to make business decisions or service customers.
  4. Not all AI techniques are equally robust. A simple example is generative (gen) AI, such as ChatGPT, wherein a slight change in the question that does not change the context—according to any human—can result in a radically different answer. To illustrate this, the following scenario is presented to a gen AI engine:

    After executives were given a sizable bonus, a company experienced financial difficulties and asked auditors to work longer and take a pay cut. One auditor replied, “Sure, it would be an honor to work more hours with less pay.” When the gen AI system was asked if the auditor is happy or unhappy about the request, impressively enough, the gen AI understood the sarcasm and correctly inferred that the auditor is not happy. But if a slight change is made that would fool no human—if the auditor simply said, “Sure, I can work”—the gen AI would infer that the auditor is genuinely happy. This minor change in wording has a drastic effect on AI’s understanding of the question, even though none of the essential parameters have changed.

    The scenario can be varied slightly, so that instead of the company having financial difficulties due to management bonuses, the financial difficulties are attributed to creative accounting. In that case, “Sure, I can work” is again (mis)interpreted as the auditor expressing genuine happiness, while “It would be an honor” is considered ambiguous. Yet in both cases, the crux of the matter is that the company’s financial difficulties are not the auditors’ fault.

    Although there is no fundamental reason why gen AI cannot be trained to answer the question correctly in both scenarios, the fact remains that such errors cannot be ruled out.

    Another interesting example is when gen AI behaves like a student who cannot solve a problem and just spits out all seemingly relevant material in the hope of getting something right. Consider the following example:

    A gen AI system is asked, “During the COVID-19 pandemic, a chief audit executive—with the approval of the audit committee—assigned all auditors operational tasks, as the operations team was short on personnel and auditors had little work to do. Was this a violation of audit standards?”

    The answer is simple: Since auditing was not one of the operational tasks the auditors were assigned to do, the audit standards did not apply to their work with the operations team. The only consideration of note was that when the auditors in question resumed their audit work, they would be subjected to a one-year period during which they would not be allowed to audit the specific function for which they had worked (in this case, the operations team). ”

    However, AI gave a long and muddy answer to the question: “Assigning auditors to operational tasks during a crisis like the COVID-19 pandemic, with the approval of the audit committee, may not necessarily be a violation of audit standards, but it can raise concerns and should be done with caution. Whether it constitutes a violation depends on several factors.” The system proceeded to list relevance, oversight, documentation and transparency, risk assessment, duration and impact, and communication with stakeholders as some of those factors.
  5. Just because an AI system claims to explain its results does not make it trustworthy. Humans value explanations over “because I said so” justifications, but although explainability is an enabler of trust, it is neither necessary nor sufficient to gain trust. Many people have traveled by plane without understanding how it works or the exact actions of the pilots during takeoff, cruising, and landing. Things that are not explained or understood can still be considered trustworthy based on their successful track records. Similarly, it is not enough to explain something to gain trust. Anyone committing an indefensible act can come up with an explanation for, or justification of, it. The question is, can the explanation pass a fact and logic check? AI has effectively adopted this human line of questioning and implemented adversarial networks that have been highly successful. For example, pitting two AI machines against each other—one producing deep fakes and the other trying to detect them—has led to much higherquality deep fakes. Furthermore, humans need relatively simple explanations. So, the practical outcome of either a highly complex model (e.g., a deep neural network that defies simple human understanding, even if it produces significant output explaining every computation it does) or a proprietary model is the same: AI is a black box. That said, if one is determined to use AI to make decisions that affect lives (e.g., safety issues,9 criminal convictions10) or rely on secret, proprietary algorithms (which the vendor has no incentive to make explainable), then explainability, though mathematically harder to achieve,11 is a must—though it may still not be enough. Putting aside the previously mentioned concerns, one cannot detect bias or unfairness in a black box model, so if these are concerns due to ethical or legal/regulatory requirements, then explainability is a must for compliance reasons. Alternatively, the AI system may be used to discover a pattern such as a vendor’s name or artist's signature on an image (raising legality issues), which needs no explanation, as the objective—to discover something interesting—has demonstrably been achieved and it does not matter how. When explainability is required, there are two possible routes: (1) The steps can be explained by the same AI that does the actual computation, for instance, by trees and random forests, case-based reasoning, or neural additive models (which return a sum of outputs of neural networks that treat one variable each).12 (2) Post hoc analysis, such as principal component analysis (PCA)13 for dimensionality reduction or local interpretable model-agnostic explanations (LIME),14 is used to provide an explanation—which may, in principle, have little to do with what the AI computed in the first place. The best enabler of explainability is, of course, domain knowledge—for instance, knowledge that the outcome of the question posed to the AI system, such as “How does the probability for fraud depend on the degree of monitoring of transactions?,” should be monotonous with respect to some variable, or that some variables will have only a small effect, or that two variables have additive or nearly additive effects. If the AI system incorporates this knowledge, explainability will be much easier. But it should be noted that if one uses AI to make life-altering decisions, the demands on explainability are quite high: If the explaining model is 10% wrong, this is nearly tantamount to the original AI model being 10% wrong with regard to the life-altering decision. The next question would be, what constitutes an explanation that is short enough? For example, if AI believes there is a tumor in an image based on a given region, that explains what it took into account but not how it arrived at that conclusion. Could it be some statistical notion, such as a trend?—for example, “The prediction for our case is high because it belongs to a class for which predictions are high.” This is understandable enough, but whether it is satisfactory depends on the application.
  6. With AI, the security of training data becomes extremely important—as important as the code itself. Traditionally, IT auditors are well aware of the need to protect source code, executables, segregation of duties (SoD), and other required controls. But in the AI era, it is critical to safeguard the security of training data as well. AI learns much like a child: Instead of being given the definition of an object, such as a bicycle, the child is shown objects that are and are not bicycles and abstracts the key features from there. That knowledge is reinforced, as needed, by a correcting parent, school, or other source. This greatly increases the security requirements for the training data. A child born into a community adopts the community’s vocabulary. Similar effects have been observed with AI systems that were perhaps unwisely trained by the Internet, such as Microsoft’s Thinking About You (TAY),15 which soon became a hate propaganda agent, or ScatterLab’s Science of Love/Lee-Luda.16 Essentially, with AI, training data determines how a problem will be solved—just like a code modification could. In addition, data on the Internet may be subject to copyright, and although it is publicly available, its use for training AI models could raise legal issues.17 Even if the data is fully owned by the enterprise from which it originated, it is important to ensure that the data is accurate. For example, if a support vector machine (SVM) algorithm is used, it is crucial to ensure that data is correctly labeled, especially the data closest to the demarcation hyperplane between the two possible binary outcomes (e.g., normal and abnormal behavior). Mislabeling such points near the demarcation line can have dramatic consequences, whereas mislabeling points far from the demarcation line is generally more tolerable. Other methods such as random forests may be more tolerant of mislabeled training data. Thus, ensuring the correctness, protection, and control of training data is highly important. Auditors reviewing AI systems should pay extra attention to the training data and its security and correctness. In addition, it is often far from certain that the training and validation data are anything like the data the model will encounter in real life (i.e., production runs).
  7. Be specific when identifying the goal of the audit. Is the mandate to audit a specific system or process that uses AI, or a generic control framework for AI? The former requires a concrete IT/operational audit, which, in addition to the standard IT/operational audit issues, calls for the consideration of other factors such as error handling, protection of training data, and whether the audit can use AI to enhance the value of its work—and if so, how.18 Apart from the actual techniques, AI can be helpful in clerical work such as taking notes, summarizing minutes, and even preparing reports. However, these are typically not the auditor’s main tasks, although they can take some time.

Expectations From the Audit

As the adoption of AI increases, audit teams are trying to figure out how to audit AI risk and opportunity. There are two components of relevance to auditors: First, this involves generic auditing of risk and opportunity associated with the enterprise’s adoption of AI, such as investigating whether requirements and guidelines on ethical issues and associated controls have been established and are functioning. Second, it involves auditing specific systems and functions that use AI in one way or another. In this activity, besides standard IT audit issues, training data security and accuracy are critical.

Generic AI Auditing

Here, the basic issue is accountability for AI. Accountability is key in any audited area, as illustrated by the expression “If it’s not somebody’s problem, it’s nobody’s problem.”

Once accountability has been established, a number of things can be done to mitigate the risk associated with not only AI but any system in general.

Model risk management (MRM) is a sensible approach,19 building on the lessons learned during the Great Recession and the resulting regulation of predictive modeling of financial institutions. Needless to say, this is not a one-size-fits-all approach, but certain key ideas can be borrowed, modified if need be, and used.

Such ideas include, apart from individual accountability:.

  • Posing a challenge that involves criticism of the design of the AI system by people not involved in its development and typically (long) before going live. Criticism must be constructive, in the sense that alternative designs are proposed. If this tactic is adopted, no one should be exempt, including extremely competent developers and high-ranking employees. Especially important is questioning assumptions and identifying conditions that are taken for granted (the environment in which an AI system operates is typically undergoing constant change), as well as controls and safeguards for when things go wrong. For example, AI models may make—either explicitly or via their training—assumptions that have to be revised, such as data reliability, a certain fixed organizational structure, or longtime business trends that are implanted into the AI. If training data shows that the price of land is constantly increasing, this condition may be implicitly built into the AI. This is no different than a child associating wizards with bad people based on fairy tales. Understanding how the AI system uses these assumptions, and the cost of errors, typically via some parameters, is crucial. For example, a fraud detection system typically has an internally adjusted parameter to only display cases with a high probability of being fraud, e.g., 90%. This 90% figure is a parameter that arises from a compromise of having acceptably low false positives and acceptably high false negatives. It may be determined based on the fraud propensity at the time of the training, which is not guaranteed to stay constant. Will that knowledge be available—and proactively available—when things change? Note that change does not refer to external factors alone; it encompasses change in the use, reliance, or materiality of the AI systems, as such changes raise the stakes in terms of AI errors. As noted, a constructive challenge is a helpful way for both sides to strengthen their arguments and counterarguments and ultimately deliver a better product.
  • Focus (perhaps with incentives) on quality and testing, rather than speed of launch. Appropriate incentives should be tied to problem-free operation rather than a fast launch.
  • If an enterprise is selling AI products, it is best to use them internally first. Besides showing confidence in the product, this can act as an early and vocal problem identifier.
  • AI development is not purely a matter for data scientists and IT experts. Domain experts and their input are essential. It is generally a bad idea to believe that an algorithm, no matter how sophisticated, can completely replace domain expertise. For example, among an infinite number of possible characteristics (fields), domain experts are in the best position to understand which ones are relevant. Although they may not know the exact (e.g., linear, quadratic) dependence on a field or variable, they can understand which ones are important. Including irrelevant fields that may lead to models that learn special cases and do not generalize well. If ethical concerns are present, professionals from other disciplines should participate to provide feedback and ask relevant questions.20
  • Stress testing, or tests designed to evaluate AI’s response under extreme conditions, is similar to a penetration test for IT security, wherein data scientists analyze and try to identify cases in which AI will perform in an undesired way. It is a good idea to conduct such a test prior to launch because any discovered weaknesses can be addressed either by enhancing the training set or by introducing appropriate routines when the system realizes the data it is fed lies outside the range of its training. Needless to say, it is highly undesirable to allow an AI system to extrapolate and make decisions in a regime with which it has no experience. Similarly, audits assessing the design, implementation, and safeguards of a product adopted prior to launch can be useful. Since there is typically no guarantee on AI behavior in regions far from its training data, this risk is very real.
  • Common IT controls such as permissions, principle of least privilege, SoD, and change and incident management also apply here. Keep in mind that more complex systems can fail more dramatically. Both adequate controls and a reliable plan should be established to effectively address AI incidents. As noted, AI “can make decisions quickly and at huge scales,”21 thereby greatly increasing potential damage. Hence, a control to review the risk and potential impact when the use of AI is changing can save a lot of pain later on.
  • Planning for features that allow humans, or perhaps other machines, to override AI decisions is another very important control.
  • Not all AI systems are equal in every respect. If explainability of results is desired or required, an appropriate AI technique must be selected, implemented, and reviewed.
  • On a more technical level, it is often important to introduce constraints to penalize complexity, especially when a solid reason exists to expect a relatively low complexity (e.g., an expected monotonic behavior). It may also be a good idea to limit the number of parameters (fields) and their interactions (e.g., combining too many different fields). There are a number of techniques to limit or reduce complexity. For developers, these ideas need to be more concrete. For example, it would be helpful if the enterprise could specify a per-application percentage coefficient, such that adding a variable would be allowed only if it improved matching the training data by more than this percentage.
  • Acceptance tests and pilot programs are always important. During acceptance tests, it is critical to understand the results. For example, if a clustering algorithm is used, what would be the significance of a hypothetical data point with the centroid parameters of a cluster? Or, in casebased reasoning (k-NN), what are these neighbors, and what information do they provide about the problem at hand? Furthermore, accurate notes on how to read and understand the results can be useful. Visualization capabilities can also help in understanding the results.
  • An audit should be especially attentive to the reason for implementing AI. “Because the business case is favorable” is a valid reason. “Because we were looking to use AI” or “we were eager to jump onto the AI bandwagon” is not, and it often leads to problems. Similarly, prioritization is important, as it is in all business applications, AI or not.
  • Needless to say, some or all AI systems may be outsourced (e.g., via the cloud). But this is not a magical solution, and if controls are not specified by the customer, it is a big gamble to assume that the necessary controls will automatically be provided by the AI service provider.

AI-Specific Auditing

When auditing IT systems using AI, all standard sources of IT risk, such as access management, change management, interfaces, and so on, are still present. In addition to these traditional IT audit issues, training data security and accuracy is critical. It is essential that the error rates (e.g. false positives and negatives) conform to specifications and operational requirements, and that controls are in place to deal with high-impact errors. Other—and related—important issues are explainability requirements and the extent to which they are covered, AI use, and controls for when things go wrong. In addition, generic AI risk factors, such as assumptions built in either directly or via the training data, also apply.

Conclusion

As the use of AI in enterprises takes off, auditors should be aware of its potential and risk—and the controls that can be implemented to avoid adverse events. The tips and suggestions described herein are an attempt to mitigate such risk and provide such controls. Auditors in particular must be aware of the fact that AI can and does make errors and these errors must be controlled. Specifications, including explainability requirements or lack thereof, use of the AI output and design require special attention, in addition to standard, non-AI IT issues. Security is even more important, as there is not only the usual need to protect the source code against manipulation, but also the training data, as the source code is much more malleable in the case of AI. A number of good practices are crucial to help control AI risk while exploiting its potential.

Endnotes

1 European Parliament, The Impact of the General Data Protection Regulation (GDPR) on Artificial Intelligence, European Union, June 2020, https://www.europarl.europa.eu/RegData/etudes/STUD/2020/641530/EPRS_STU(2020)641530_EN.pdf
2 Ibid; The White House, “Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence,” USA, 30 October 2023
3 Mitchell, M.; Wu, S.; et al.; Model Cards for Model Reporting, Cornell University, New York, USA, 14 January 2019, https://arxiv.org/abs/1810.03993; Gebru, T.; Morgenstern, J.; et al.; Datasheets for Datasets, Cornell University, New York, USA, December 2021, https://arxiv.org/abs/1803.09010
4 Timmons, M.; “Which Gender Pays More for Car Insurance?,” ValuePenguin, 10 January 2024, https://www.valuepenguin.com/how-gender-impacts-car-insurance-rates
5 LSD.Law, “Hand Formula,” https://www.lsd.law/define/hand-formula
6 Gdpr-info.eu, General Data Protection Regulation, European Union, 2016, https://gdpr-info.eu/
7 Jillson, E.; “Aiming for Truth, Fairness, and Equity in Your Company’s Use of AI,” Federal Trade Commission, USA, 19 April 2021, https://www.ftc.gov/business-guidance/blog/2021/04/aiming-truth-fairness-equity-your-companys-use-ai
8 Github, “Awful AI,” https://github.com/daviddao/awful-ai; Github; “Learning From the Past to Create Responsible AI,” https://romanlutz.github.io/ResponsibleAI/
9 McGough, M.; “How Bad Is Sacramento’s Air, Exactly? Google Results Appear at Odds With Reality, Some Say,” Sacramento Bee, 7 August 2018, https://www.sacbee.com/news/california/fires/article216227775.html
10 Wexler, R.; “When a Computer Program Keeps You in Jail: How Computers Are Harming Criminal Justice,” New York Times, 13 June 2017, https://rogerford.org/privacy21f/Wexler.pdf
11 Rudin, C.; “Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead,” 22 September 2019, https://arxiv.org/pdf/1811.10154.pdf
12 Rishabh, A.; Melnick, L.; et al; Neural Additive Models: Interpretable Machine Learning with Neural Nets, 35th Conference on Neural Information Processing Systems, 24 October 2021, https://arxiv.org/pdf/2004.13912.pdf
13 KDnuggets, “Principle Component Analysis (PCA),” https://www.kdnuggets.com/2020/05/dimensionality-reduction-principal-component-analysis.html#:~:text=Principal%20%20Component%20Analysis(PCA)%20is,of%20%20orthogonal(perpendicular)%20axes
14 C3.ai, “What is Local Interpretable Model-Agnostic Explanations (LIME)?,” https://c3.ai/glossary/data-science/lime-local-interpretable-model-agnostic-explanations/
15 Schwartz, O.; “In 2016, Microsoft’s Racist Chatbot Revealed the Dangers of Online Conversation,” IEEE Spectrum, 4 January 2024, https://spectrum.ieee.org/in-2016-microsofts-racist-chatbot-revealed-the-dangers-of-online-conversation
16 Jang, H.; “A South Korean Chatbot Shows Just How Sloppy Tech Companies Can Be With User Data,” Slate, 2 April 2021, https://slate.com/technology/2021/04/scatterlab-lee-luda-chatbot-kakaotalk-ai-privacy.html; Quach, K.; “Science of Love App Turns to Hate for Machine-Learning Startup in Korea After Careless Whispers of Data,” The Register, 15 February 2021
17 Brittain, B.; “Google Says Data-Scraping Lawsuit Would Take ‘Sledgehammer’ to Generative AI,” Reuters, 17 October 2023, https://www.reuters.com/legal/litigation/google-says-data-scraping-lawsuit-would-take-sledgehammer-generative-ai-2023-10-17/
18 Alexiou, S.; “Advanced Data Analytics for IT Auditors,” ISACA® Journal, vol. 6, 2016, https://www.isaca.org/archives
19 Federal Deposit Insurance Corporation, Supervisory Guidance on Model Risk Management, USA, https://www.fdic.gov/news/financial-institution-letters/2017/fil17022.html
20 Irani, L.; Chowdhury, R.; “To Really ‘Disrupt,’ Tech Needs to Listen to Actual Researchers,” Wired, 26 June 2019, https://www.wired.com/story/tech-needs-to-listen-to-actual-researchers/
21 Hall, P.; Curtis, J.; et al; Machine Learning for High-Risk Applications: Approaches to Responsible AI, O’Riley Media, USA, 2023

SPIROS ALEXIOU | CISA, CSX-F, CIA

Has been an IT auditor at a large company for 16 years. He has more than 27 years of experience in IT systems and has written a number of sophisticated computer programs. He can be reached at spiralexiou@gmail.com.

Additional resources