Editor’s note: The following is a sponsored blog post from QA.
AI has long since shifted from emerging capability to known industrial competitive advantage. But while capability is accelerating, deployments in organizations at scale are not. One of the key constraints isn’t access to AI models it’s still the trust gap in mission-critical environments.
AI assurance could be a major factor in unlocking technical ambition at scale (not to be confused with AI governance, which I’ll explain later). At its heart AI assurance is the set of processes that determine whether a system behaves as intended, stays within defined bounds, aligns with organizational values and withstands legal and regulatory scrutiny in its operational context.
Across industry, AI assurance capability remains uneven. Some organizations have developed credible in-house practices, supported by structured AI governance and AI testing. Others remain fragmented or dependent on external providers. What is consistent, however, is the shortage of the skills, infrastructure and repeatable processes required to do AI assurance at scale.
Businesses find themselves constrained with bottlenecks, holding back innovation and value. AI “use cases” can be deployed faster than they can be trusted, and where trust breaks, AI deployments don’t scale. Commercially, this gap is misunderstood and that matters. The ability to operationalize AI, not just experiment with it, beyond Copilot or vibe coding adoptions, will increasingly define an organization’s commercial competitive advantage.
Not simply a technical problem
Poorly assured AI systems can introduce failure modes that are difficult to detect, explain or predict – especially with performance degradation over time, bias under specific conditions, weakness outside training distributions and “black box” decision-making. In a mission-critical environment, those risks could translate directly into operational exposure, legal challenges and, at the very least, a loss of public confidence. In adversarial terms, with the threat landscape what it is, they will inevitability introduce exploitable weaknesses.
The most visible failures tend to be generative AI issues, e.g. rogue chatbots and hallucinated outputs, because they’re easy to track and explain. The more significant risks are quieter and harder to communicate. Consider machine learning or agentic systems behaving unpredictably inside operational environments like traffic systems, energy networks, global logistic supply chains, decision-support tools. In my view, AI assurance cannot be periodic or front-loaded: it must be continuous, embedded and operational if it is to protect people, infrastructure and the environment.
At the same time, the demand for AI is moving so fast. The pressure is to move quickly, but speed without assurance (or safety as I’ve argued for before) increases risk rather than capability. I’ve written about this previously: check out my Balancing AI Speed & Safety blog post.
This constraint is emerging across defense, financial services, healthcare, critical infrastructure and enterprise systems, where organizations discover that agentic AI can be built and deployed experimentally far faster than it can be trusted at scale operationally. The consequences differ by industry – maybe a regulatory breach, clinical risk, systemic exposure, failed integration. They all may have the same underlying issue: weak AI assurance limits scale.
The AI communications gap
Technology providers, governance, risk and compliance specialists, and end-users are often talking past each other, using different language and operating with different assumptions about both risk and value. That misalignment unintentionally slows adoption as much as any technical constraint.
Frameworks, regulations and standards such as the NIST AI Risk Management Framework, the EU AI Act, ETSI EN 304 223, and ISO/IEC 42001, etc., are often presented as solutions to the AI assurance problem, but in practice, they don’t all solve the how AI is executed.
They do offer lots of value in their structure and provide a shared language for AI risk, governance and accountability, allowing us to anchor decision-making, demonstrate compliance and regulatory alignment, and signal credibility.
In regulated environments, these frameworks, regulations and standards are necessary baselines, and increasingly non-negotiable for market access. The issue is, they do not resolve or address the operational challenge of how AI systems are tested, validated and monitored in real-world conditions. Applied well, they introduce discipline; applied poorly, they become compliance relics, generating documentation without materially improving understanding of AI system behavior. There’s also an inherent lag between the speed of AI development and the evolution of standards, particularly in areas such as adaptive systems, autonomy and real-time decision-making.
I’m not suggesting these frameworks are not important, but they are insufficient on their own. They allow for proportionate guardrails but lack the industrial depth and evidential context to build trust in capability. And this is because we are missing a consistent definition of what constitutes “good” AI assurance, which must come from regulators and their industry bodies. Inconsistent approaches and a tendency to default either to over-engineered in-house processes, particularly in the “frontier” organizations, or ad hoc use of external “by-the-hour” consultancies, are counter-productive.
AI governance vs. AI assurance
When explaining AI governance vs. AI assurance, a useful analogy is to look at the aviation industry. Aviation governance is the regulatory regime from the Civil Aviation Authority (CAA) or the Federal Aviation Administration (FAA) rules, international treaties, airline policies, while assurance is what happens before a plane flies: the inspections, testing, certification and ongoing maintenance checks that demonstrate airworthiness.
They are interrelated in this context with governance defining what needs to be assured and who is accountable for it, while assurance provides the evidence and mechanisms that governance frameworks rely on to function.
You can have governance without robust assurance – for example, rules without an enforcement mechanism, and assurance activity without governance (technical testing with no policy context), but they work best together. In practice, organizations scaling AI need both: governance to set direction and accountability, and assurance to generate the technical evidence loop that shows governance decisions are actually being upheld.
The path forward is not a binary choice between governance and assurance but a more deliberate integration of both. Government defines thresholds, risk tiers and accountability frameworks, so industry can develop, test and iterate against those requirements. Then, assurance will become a continuous process embedded across the AI lifecycle, rather than a check-box point in time exercise, which is, and has always been flawed.
Organizations that resolve this quickly will gain the necessary confidence to move beyond experimentation. They’ll deploy and use AI with confidence, and integrate it into operational workflows for commercial and operational advantage. Those that do not will remain constrained by uncertainty and slow to scale beyond pilots.
Elusive AI growth and value indicators
The AI assurance gap across our regulated and critical industries could impact growth and economic value in terms of Gross Domestic Product (GDP). If you haven’t looked at OpenAI’s GDPval, which is framework designed to measure AI performance on “economically valuable tasks,” I’d recommend you take a look. They started with the concept of GDP as an economic indicator and drew tasks from the key occupations in the industries that contribute most to GDP. It’s an interesting way to debunk the problematic ROI in AI question. Regulators should be looking to incentivize AI adoption and industrialize the value of innovation.
I would expect this to have wider commercial implications as we look beyond AI governance to AI assurance becoming an economic growth lever. Regulators and industry bodies will influence which systems should be bought, which vendors are credible, and, ultimately, which AI deployments really scale.