Are AI BOMs the Future?

Are AI BOMs the Future?
Author: Ed Moyle, CISSP
Date Published: 17 July 2024
Read Time: 10 minutes

If you’ve been employed as a trust professional (i.e., working in assurance, cybersecurity, privacy, etc.) within the past few years, you’ve likely been involved—either directly or indirectly—in discussions around software bills of materials (SBOM). If you’re a part of an organization that manufactures software for instance, chances are good that at least a subset of customers have inquired about obtaining SBOMs. If you work in an organization that consumes software made by others (and who among us doesn't?), you’ve likewise probably encountered the idea of SBOM in the trade press, at professional conferences, or in peer discussions.

If you aren’t familiar with SBOMs, the concept is very straightforward. An SBOM is a structured method designed to provide transparency into the software our organizations purchase and use—the software supply chain. To understand why this would be useful, consider the impact felt by organizations associated with vulnerabilities like Log4Shell (CVE-2021-44228), Heartbleed (CVE-2014-0160), or any vulnerability occurring in ubiquitously-deployed open source software.

As anybody who has lived through one of these issues will tell you, it is an onerous and time-consuming task to identify the source of the exposure. Why? Because not only do you need to determine your own usage of the vulnerable software, but you also need to locate the dependencies in the software you use—places where the software you acquire from others may be creating exposure by virtue of using the impacted component(s). Moreover, the software you use from others may itself have dependencies on the vulnerable components, creating second, third, and even fourth order dependencies as well.

If this sounds confusing, consider the example of Log4J. For many of us, understanding our exposure to Log4J means doing a few things all at the same time. First, we need to understand whether our organizations make use of the component. If they do, we need to remediate the vulnerability and notify customers downstream. This is the most obvious impact. In addition to this though, we also need to understand whether the software we use (both open source and commercial) employs the component. This means we need to make inquiries about vulnerability status to our software suppliers to determine the extent of impact to them. But we’re still not done. Since those we acquire software from also have their own dependencies in turn, they need to do the same thing we do: understand their own exposure, remediate, and investigate the exposure of the components they use. This happens recursively down the chain until the most atomic components are reached.

What you wind up with is a very complicated chain of interdependencies where literally each component needs to be interrogated and examined to determine whether it does or does not utilize the impacted component.

There is a clear shift in risk dynamics on the horizon, including an entirely new attack surface. One possible solution exists in the establishment of standards designed to promote transparency into how these models operate and what dependencies they have.

SBOM promises to help alleviate this. In an SBOM, the components making up the package are listed clearly and directly. The SBOM also includes all supporting dependencies that may be packaged with the software like libraries, statically linked code, or shared objects. As such, it offers benefits analogous to the ingredient label on a packaged food item. Someone with a need to ensure that a particular ingredient is or is not present in a food item—for example, someone with a severe food allergy—can easily tell whether a harmful ingredient is present in the food that they purchase. The ingredient label lets them know if what’s in the box will be harmful to them or can be safely consumed. An SBOM functions in much the same way for organizations.

While this idea has been advocated by many over the years (including myself in previous Journal columns1) the complexity associated with creating, consuming, and managing SBOMs has made industry-wide adoption and acceptance slow to occur. So much so in fact that, unless you’re in an organization with a regulatory need to create SBOMs (e.g., medical device software manufacturers who are subject to government oversight), chances are good that SBOMs aren’t part of your software security or assurance programs. While the idea of an SBOM itself is still a valuable and compelling one, practical challenges in many respects have all but eclipsed the full value of the promise offered.

Enter AI/ML BOMs

The reason I’m bringing this up now is due to the recent emergence of a new type of bill of material: artificial intelligence (AI) and/or machine learning (ML) BOMs. What is an AI/ML BOM, you ask? If you followed the description of SBOM above, the premise is simple: it’s a bill of materials to understand what is included in the AI/ML components you employ.

There’s a simple reason why this is valuable. There’s a whole new world of engineering going on within organizations relative to AI/ML that many of us in the trust world are nearly blind to. Consider recent analysis from security firm JFrog, who found hundreds of actively malicious code execution issues in ML models uploaded to the popular AI collaboration Hugging Face.2

If you’ve never heard of Hugging Face, you’re not alone – and that’s exactly the point I’m highlighting. Hugging Face is an extremely popular platform used by those in the ML and AI communities for the sharing of models. It’s been called the “GitHub of Machine Learning” and data shared recently by Hugging Face Chief Evangelist Julien Simon indicates activity of more than one million model downloads per day from the Hugging Face Model Hub.3 Meaning, organizations across the globe are downloading semi-trusted models and running them directly, often without any validation of provenance, scanning for malicious capabilities, or other technical vetting.

The JFrog researchers discovered hundreds of remote backdoors hidden within these models. Those researchers lay out the threat this way in their analysis: “The model’s payload grants the attacker a shell on the compromised machine, enabling them to gain full control over victims’ machines through what is commonly referred to as a 'backdoor’... This silent infiltration could potentially grant access to critical internal systems and pave the way for large-scale data breaches or even corporate espionage, impacting not just individual users but potentially entire organizations across the globe, all while leaving victims utterly unaware of their compromised state.”4 Scary? You bet it is.

The core issue with this is that, frankly, we in the trust space are still catching up when it comes to this attack surface. Most of us are not yet fully versed in AI and ML risk, risk mitigation strategies, prevention techniques, and so on. We may not even be aware of the ML and AI engineering work being done within our organizations. And while structured process frameworks and approaches like MLSecOps5 are emerging and gaining prominence, the fact of the matter is that these issues represent an area where most of us don’t have our eye directly on the ball. It’s an area where potential for significant risk abounds, where large-scale nefarious activity from untrustworthy actors is already in play, and where visibility for most trust professionals is practically nil.

Practical Viability

The whole point I’m trying to make here is that we’re at an inflection point. There is a clear shift in risk dynamics on the horizon, including an entirely new attack surface. One possible solution exists in the establishment of standards designed to promote transparency into how these models operate and what dependencies they have. But frankly, signs here are mixed.

The US Army, for example, has taken a strong position on the utility of AI BOM. Mr. Young J. Bang, US Principal Deputy Assistant Secretary of the Army for Acquisition, Logistics & Technology, has gone on record highlighting the value of AI BOM’s: “We've been driving and pushing software BOMs and data BOMs. We're toying with the notion of an AI BOM.…Just like we're securing our supply chain, with semiconductors, components, sub components, we're also thinking about that from a digital perspective.”6 But while the US Army has been bullish on AI BOM, they’ve also recently backed off somewhat from the standard that they intend to hold supply chain partners to. Instead of requiring a full BOM, they are now requesting that contractors provide only a data-sheet style informational piece instead of a detailed BOM.7

There has been standardization work happening in the industry. OWASP has recently integrated ML BOM capability into their widely-recognized SBOM standard CycloneDX as of version 1.5.8 This is a positive sign and a signal of potential momentum, as CycloneDX is one of the most commonly-employed standards for SBOM creation and distribution. This reflects recognition of the need for AI/ML SBOMs among the practitioner community. Additionally, attention on the topic in broader circles such as the AI BOM workshop at RSA Conference 20249 points to positive signs of broader understanding of both the problem and the need for a solution.

At the same time, we have to admit that even SBOMs haven’t yet reached universal acceptance or adoption, despite years of pressure being applied to software vendors by regulators, large enterprise buyers, practitioners, and governments. If the track record associated with SBOM adoption is any indication of the path ahead for AI/ML BOM, we might be some years away from a full, industry-wide embrace of the concept.

So, what is the point of raising the need for AI BOM now? Well, there are solid, practical reasons to do so. First, talking about these trends and movements in the space can be beneficial to trust professionals by bringing awareness of the broader problem to those who might still be unfamiliar with SBOM generally or AI/ML BOM specifically. Second, building an understanding of the complexities that necessitate AI BOM helps inform understanding of our internal risk posture. Specifically, it can cause us reach out internally within our organizations to ask questions about our AI/ML supply chain, to better understand the provenance of components used within it, and to get to a deeper knowledge of our own attack surface. After all, even without a BOM to help standardize how we assess our exposure, just knowing that exposure exists can help proactively mitigate it. And lastly, it helps us in our own way to drive acceptance of both SBOM generally and AI/ML BOM specifically. We can ask our vendors for them if we plan to make productive use of them in our program), we can look internally to evaluate how we might prepare our own SBOMs, and we can build a more advanced skill set to enhance our marketability in the IT landscape of the future.

Endnotes

1 Moyle, E.; “Making the Software Supply Chain Practical,” ISACA Journal, vol. 4, 2023, https://www.isaca.org/resources/isaca-journal/issues/2023/volume-4/making-the-software-supply-chain-practical
2 Montalbano, E. “Hugging Face AI Platform Riddled With 100 Malicious Code-Execution Models.” Dark Readings, 25 Feb. 2024, https://www.darkreading.com/application-security/hugging-face-ai-platform-100-malicious-code-execution-models
3 MacManus, R. “How Hugging Face Positions Itself in the Open LLM Stack.” The New Stack, 20 June 2023, https://thenewstack.io/how-hugging-face-positions-itself-in-the-open-llm-stack/.
4 Cohen, D. “Data Scientists Targeted by Malicious Hugging Face ML Models with Silent Backdoor.” JFrog, 27 Feb. 2024, https://jfrog.com/blog/data-scientists-targeted-by-malicious-hugging-face-ml-models-with-silent-backdoor/
5 “MLSecOps .” MLSecOps, https://mlsecops.com.
6 “U.S. Army Is Considering AI Bill of Materials.” AFCEA International, 25 May 2023, https://www.afcea.org/signal-media/cyber-edge/us-army-considering-ai-bill-materials
7 Freedberg Jr., S. J. “‘AI-BOM’ Bombs: Army Backs off, Will Demand Less Detailed Data from AI Vendors.” Breaking Defense, 23 Apr. 2024, https://breakingdefense.com/2024/04/ai-bom-bombs-army-backs-off-will-demand-less-detailed-data-from-ai-vendors/
8 “Introducing OWASP CycloneDX v1.5: Advanced Bill of Materials Standard Empowering Transparency, Security, and Compliance.” CycloneDX, 26 June 2023, https://cyclonedx.org/news/cyclonedx-v1.5-released/
9 “AIBOM Workshop.” GitHub, https://github.com/aibom-workshop/rsa-2024 .

ED MOYLE | CISSP

Is currently chief information security officer for Drake Software. In his years in information security, Moyle has held numerous positions including director of thought leadership and research for ISACA®, application security principal for Adaptive Biotechnologies, senior security strategist with Savvis, senior manager with CTG, and vice president and information security officer for Merrill Lynch Investment Managers. Moyle is co-author of Cryptographic Libraries for Developers and Practical Cybersecurity Architecture and a frequent contributor to the information security industry as an author, public speaker, and analyst.