The Digital Trust Imperative: To Build Digital Trust, Start With Resiliency

Building digital trust
Author: K. Brian Kelley, CISA, CDPSE, CSPO, MCSE, SECURITY+
Date Published: 1 May 2024
Read Time: 9 minutes
Related: ISACA - The Digital Trust Leader

Once upon a time, the focus in IT was to be able to recover quickly. Then larger organizations started to develop site reliability engineering (SRE) practices to ensure more than just recoverability, but also reliability. To quote architecture strategist Tanu McCabe, “A reliable system is one that is performant, secure, and meets service level objectives (SLOs), thereby instilling trust.”1 If we’re thinking about service level objectives, that means we need to think about resiliency.

Architecting for Resiliency

According to The Open Group’s Architecture Framework (TOGAF), there should be a set of guiding architecture principles for the organization. Each organization should have their own architecture principles. The Open Group provides a starting point of principles to consider for an organization who has begun the journey into adopting TOGAF, and the examples given are principles that would be embraced by most organizations. Among them is the business continuity principle, which states, “Enterprise operations are maintained in spite of system interruptions.”2 This is what most people mean by resiliency: the ability to operate in spite of specific interruptions.

Organizations that are able to maintain their operations and have a good service record tend to be more trusted than those with spottier performance. Resiliency means availability. Therefore, we should build solutions whose technical architecture ensures availability. However, availability is not just limited to enterprise architecture. Consider the oft-cited confidentiality, integrity, and availability (CIA) triad. Availability is included because a lot of what is done in cybersecurity involves protecting assets so that they are available to the end user. After all, if a system or service can’t be accessed, it doesn’t do anyone any good.

Digital Trust and Resiliency

The ties between resiliency and both technical architecture and security make sense. So how does digital trust apply to resiliency? In short, solid digital trust can help with resiliency, though it might do so indirectly. Of course, poor resiliency will impact digital trust negatively. Poor digital trust might also lead to lackluster options for resiliency, too. If my organization has an excellent solution to enhance resiliency for other companies, but your organization’s digital trust reputation is poor, it might not be worth my organization’s time and resources to do business with yours. Therefore, your organization can’t use my organization’s solution. Digital trust and resiliency are tightly coupled. To see this, let’s look at how digital trust enhances resiliency, and how resiliency failures can harm digital trust.

Digital Trust Enhances Resiliency

Malware has been a big problem for years. Once upon a time, there was not a clearinghouse for antivirus (AV) companies to easily trade submitted samples. Enter VirusTotal. Originally a website developed by a Spanish security company, it was subsequently bought by Google and now belongs to a Google subsidiary, Chronicle. VirusTotal has a number of contributors, among them all the major AV/malware companies in the world.3 It became a trusted independent entity which granted the various AV companies the ability to both submit and download malware samples, thereby increasing the speed by which every participatory company could isolate the malware, generate a signature for it, and push out the update to every connected client.

I can remember when the various antivirus vendors struggled to get out definitions quickly enough with respect to new, aggressively spreading worms in the wild. For instance, fighting the W32/Blaster worm4 and its variants was a painful endeavor. We looked at what it was doing and tried to circumvent its behavior until we could deploy AV definitions (especially on systems that couldn’t be patched to the latest version at the time). In that era prior to collaboration using VirusTotal, AV companies had a much more difficult time getting samples quickly to generate definitions. One vendor could have definitions updated in hours while their competitors took days. Any delay, especially with fast-spreading worms, was (and is) detrimental to the overall digital landscape as a whole. Consider that a few hours can make all the difference between a blocked infestation and a massive compromise. Therefore, the trust and use of VirusTotal by the various AV/malware protection vendors have increased resilience for all organizations.

When an outage can mean a business loss of half a million dollars or more, the longer or more frequent the outages are, the less customers will trust the organization and the more they will consider making a change.

Similarly, trust in an organization’s bounty program can result in the discovery of, and a patch for, a vulnerability before bad actors have a chance to discover and exploit said vulnerability. Companies that have acted in good faith have given out significant awards to security researchers who turn in previously unknown vulnerabilities. This encourages other security researchers to also turn in their finds, meaning a reduced chance of zero days in the wild for such organizations’ products. Contrast this behavior to the old fights between Oracle and security researchers such as David Litchfield, who pressed the vendor to patch its flagship software.5

Resiliency Failures Harm Digital Trust

In a survey conducted by Parametrix Insurance with respect to cloud outage risk, 23% of respondents estimated each hour of downtime cost their organizations US$500,000 or more.6 Enough outages and the trust these larger organizations would have in a particular cloud provider would quickly plummet, leading them to find a better solution. Also, news of outages tends to stick around and a few searches on one’s favorite search engine will lead to plenty of information on what the outage was and its impact on customers. And because bad news sells, and sells well, there are plenty of end-of-year articles recapping the worst of the worst with bait-worthy titles like “Top 7 Outages of 2023.”7 I say bait-worthy, but if your organization is a potential customer, understanding where particular cloud providers went wrong helps your organization better assess the risk and consider whether or not to do business with said provider. Therefore, outages lead to lowering of trust and can not only cause a loss of current customers, but can deter future customers, too.

This Tells Us We Need More Resiliency

Criminals don’t care how noble an organization’s purpose is when conducting or “enforcing” a ransomware attack. Just recently, the US government testified to how large China’s cybersecurity arm has grown and how active it has been against US infrastructure.8 Hospitals, whose purpose is to save lives, have come under increasing attack, with one survey indicating six in ten health companies had suffered a ransomware attack within the last year.9 The threats to critical operations are only increasing.

Undoubtedly, there’s an increased need to focus on resiliency from the cybersecurity side of things. However, let’s not forget how changing environmental conditions can have a significant impact on operations, too. If environmental conditions brought a company digitally offline or severely disrupted its services, that will have caused others in the ecosystem to develop a negative view of said company. Competitors will quickly jump in to get customers to switch over. The loss of business (and difficulty regaining momentum for new business) will have a real effect on the organization’s bottom line.

Therefore, there’s an increasing need for resiliency planning and implementation against all kinds of threats, both digital and physical. Here is where cloud computing can provide options to organizations that they wouldn’t have otherwise. Cloud computing allows an organization to deploy solutions across two or more geographic regions, with the ability to secure and isolate each deployment. The larger cloud providers can guard against localized failures within a region, but having deployments across regions increases the likelihood of a system staying online and usable. This allows an organization to deploy to multiple regions without having to invest in the physical plant for multiple data centers.

An organization’s resiliency has a significant impact on its digital trust. When an outage can mean a business loss of half a million dollars or more, the longer or more frequent the outages are, the less customers will trust the organization and the more they will consider making a change. Likewise, potential customers will pay attention to the details of the outages, especially their impact, when making decisions about who to trust. Finally, while much of the focus is on how resiliency impacts digital trust, there are examples where digital trust impacts resiliency, like we’ve seen with examples such as VirusTotal. In conclusion, it’s safe to say we need more of both digital trust and resiliency in the future.

Endnotes

1 McCabe, T.; “The Three R’s of SREs: Resiliency, Recovery, & Reliability,” Capital One, 8 September 2020, https://www.capitalone.com/tech/software-engineering/sres-architecting-with-resiliency-recovery-reliability/
2 The Open Group, The TOGAF Standard, Version 9.2, https://pubs.opengroup.org/architecture/togaf9-doc/arch/chap20.html#tag_20_06_04
3 VirusTotal, “Contributors,” https://docs.virustotal.com/docs/contributors
4 Bailey, M.; Cooke, E.; et al.; “The Blaster Worm: Then and Now,” IEEE Security & Privacy Magazine, 2005, https://faculty.cc.gatech.edu/~mbailey/publications/IEEE_Security_Privacy_Blaster_Final.pdf
5 Greenberg, A.; “Oracle Hacker Gets the Last Word,” Forbes, 2 February 2010, https://www.forbes.com/2010/02/02/hacker-litchfield-ellison-technology-security-oracle.html?sh=ceb9e4a43738
6 Wells, K.; “Leading Cloud Service Providers Faced 1000+ Disruptions in 2022: Parametrix,” Reinsurance News, 22 March 2023, https://www.reinsurancene.ws/leading-cloud-service-providers-faced-1000-disruptions-in-2022-parametrix/
7 Dubie, D.; “Top 7 Outages of 2023,” Network World, 2 February 2024, https://www.networkworld.com/article/1303348/top-7-outages-of-2023.html
8 Wray, C.A.; “Director Wray’s Opening Statement to the House Select Committee on the Strategic Competition Between the United States and the Chinese Communist Party,” Federal Bureau of Investigation, USA, 31 January 2024, https://www.fbi.gov/news/speeches/director-wrays-opening-statement-to-the-house-select-committee-on-the-chinese-communist-party
9 Levi, R.; “Ransomware Attacks Against Hospitals Put Patients’ Lives at Risk, Researchers Say,” NPR Morning Edition, 20 October 2023, https://www.npr.org/2023/10/20/1207367397/ransomware-attacks-against-hospitals-put-patients-lives-at-risk-researchers-say

K. BRIAN KELLEY | CISA, CDPSE, CSPO, MCSE, SECURITY+

Is an author and columnist focusing primarily on Microsoft SQL Server and Windows security. He currently serves as a data architect and an independent infrastructure/security architect concentrating on Active Directory, SQL Server, and Windows Server. He has served in a myriad of other positions, including senior database administrator, data warehouse architect, web developer, incident response team lead, and project manager. Kelley has spoken at 24 Hours of PASS, IT/Dev Connections, SQLConnections, the TechnoSecurity and Forensics Investigation Conference, the IT GRC Forum, SyntaxCon, and at various SQL Saturdays, Code Camps, and user groups.