In 2006, English mathematician Clive Humby coined the phrase “Data is the new oil.”1 Since then, analysts have worked to locate and refine the data necessary to feed the big data analysis engines that promise to deliver intelligence and enable more informed business decisions. However, the growing expanse of data makes it harder for analysts to find the right data, convert it, and archive it in a useful form using traditional database designs.
Figure 1 shows the traditional data warehousing and usage flow used by many organizations. Data sources are identified by information specialists and extracted, transformed, and loaded into new data repositories. Data applications are then built to allow other organizational users to download reports, conduct business analytics, or review developed dashboards. This traditional approach is labor-intensive due to the time required to identify, transform, store, and develop new data repositories and analytics applications.
As a result, data fabrics are being implemented by enterprises in all sectors,2 and the United States data fabric market is projected to hit US$3.7 billion by 2026.3 This is an almost tenfold increase from US$424.9 million in 2021.
Rieyan et al. define the use of data fabrics as “An automated and AI-driven fusion approach to accomplish data management unification without moving data to a centralized location for solving complex data problems.”4 The first step to understanding data fabrics is analyzing traditional data warehousing techniques. Many organizational structures consist of independent units to enable agility. Each unit often has its own data architecture, because agreeing on a shared data architecture can be difficult and time consuming. This leads to organizational data silos,5 which pose one of the greatest data-related challenges, that is, “…organizing and modeling the data to facilitate the process of linking, transforming, processing, and analyzing the data collected in order to make the best decisions promptly.”6
More than 280,714 organizations have implemented enterprise resource planning (ERP) tools to overcome this issue.7 Because they are data-driven architectures that provide up-to-the-minute detailed information across all organizational units, ERPs have been effective in generating competitive advantages through data integration.8 However, maintaining all data in one location creates situations in which the burden associated with introducing new data can be overwhelming.
The Data Fabric Difference
Data fabrics can overcome these challenges by utilizing artificial intelligence (AI). Figure 2 shows how data fabrics differ from the traditional data warehousing and usage approach by using AI.
Data Sources
Unlike traditional warehousing techniques, data fabric sources can be identified by humans or AI. Allowing AI systems to continuously identify potentially useful data in organizational transactional databases, third-party application programming interfaces (APIs), and streaming data significantly increases the amount of data that can be used in data analysis. Automating data identification “provides an ability to scale ridiculously.”9
Data Ingestion
The data is analyzed and ingested into the data repository using AI and machine learning (ML). Using automated semantic enrichment techniques improves data quality and value.10 This process alone significantly reduces the human labor required to extract, transform, and load the data. This automation allows organizations to leave data points in their natural environments, because the automation allows the data fabric to identify and connect data from disparate applications.11 It also increases the speed at which new data sources can be identified and the accuracy of the data being incorporated into the data fabric.12
Data Repositories
Onboarding data begins with connecting data sources to a data repository with cloud and software connectors.13 Connectors are programs that allow databases, applications, and services to export data to the repository. These connectors can be prefabricated or custom developed. Data fabric users therefore draw on a “rich library of ready-to-run components to prepare and blend incoming data.”14
As data is added to the data repository, AI and ML are used to search for relationships with other data sources based on organizational business rules. Additional data is identified and connected to the repository. AI and ML once again are used to determine how best to make the data available while minimizing storage costs.15 Finally, utilizing continuous analytics rather than the existing data generates metadata assets that are used to produce data catalogs, knowledge graphs, and recommendation engines that help guide users to best exploit the data repository.16
Data Users
Data fabrics provide much greater data exposure for users. This is done through data catalogs, APIs, self-service analytics, and data virtualization.17 Data fabrics provide much broader access to data, which again increases the potential use of AI to obtain insights from the data.
Data Fabrics in Action
One example of data fabrics is the You Only Need One (YONO) application created by the State Bank of India (SBI), which was designed to provide customers with digital banking and financial superstore services. SBI’s 491 million customers are supported by 22,500 worldwide branches in 36 countries. Data for these customers is located in 17 local head offices and 208 foreign offices. Creating a single data repository using traditional data warehousing techniques was considered insurmountable. SBI turned to IBM to implement a data fabric that would connect information from 76 business units in only three months. That data fabric is the backbone of the YONO application, which has more than 64 million mobile app downloads and is intended to support customer banking and financial services. The data fabric allows for the use of AI and ML to enable SBI employees to provide better, more targeted customer experiences. YONO’s current value is US$40-50 billion.18
Personal Information Security Concerns
Data fabrics are promising to provide a holistic approach for integrating data and bring organizations unlimited amounts of data that can provide many advantages.19 However, data fabrics unveil concerns about personal information security, especially because most of the available information about data fabrics is provided by organizations implementing these solutions.20 Three major areas of data privacy concerns that must be addressed are: Who has access to the data, who owns the data, and how will the data be used?
Data Access
According to the EU General Data Protection Regulation (GDPR), “‘Personal data’ means any information relating to an identified or identifiable natural person,” such as a name, an identification number, location data, or an online identifier, or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural, or social identity of that natural person.21 Information security is defined as “The protection of information and its critical elements, including the systems and hardware that use, store, and transmit that information.”22
A more detailed look into how data fabrics work demonstrates potential issues associated with personal information security. A data fabric typically integrates data from human- and AI-identified sources, which can be databases, data lakes, data warehouses, data streams, and many other forms. Data and metadata collected from the various sources are integrated into the data fabric. The metadata (“Higher level information or instructions that describe the content, context, quality, structure, and accessibility of a specific data set”23) is automatically enhanced with the help of AI. The metadata from the different data sets is then gathered in a data catalog, providing a common metadata location for datasets of the whole organization. With the help of the metadata in the data catalog, a knowledge graph is generated, which captures semantic meanings of datasets and their relationships with other datasets. The resulting knowledge graph can then be used for further analysis and to better understand the meanings of datasets and relationships between them.24 The data is then delivered, in different styles, to data users by the data fabric. The data and the processes of the data fabric are automatically orchestrated by a shared set of rules.25
Personal information is likely to be included in the data repository. Some data may be confidential or sensitive, while other data may not be. To be able to deal with data appropriately and determine security controls and access restrictions, personal information must be classified. SBI’s YONO application has more than 100 e-commerce partners that must have access to some, but not all, of a customer’s personal information. Classifying the information and setting business rules for sharing it is therefore paramount. However, this may be the easy part. The more challenging piece of personal information security may come from the information constantly being in motion and warehoused in multiple locations. Suggested approaches have been to either encrypt all data or to ensure that private data is not available via the Internet.26 Industry best practices have suggested personal data should be anonymized, or at least pseudonymized, if needed to comply with laws, regulations, and policies.27
Regardless of the approach taken, data fabrics require continuous analysis to ensure that organizations comply with personal information security laws, regulations, and policies.One suggestion, based on the least disclosure principle, is to share only necessary attributes or aggregated personal information instead of the entire dataset.28 Regardless of the approach taken, data fabrics require continuous analysis to ensure that organizations comply with personal information security laws, regulations, and policies. This may be even more important because data fabrics, like the SBI YONO application, share information across multiple countries, divisions, and e-commerce partners. Implementation of data fabrics therefore must consider multinational cybersecurity regulations. A data fabric attribute that may assist in this area is the set of monitoring and auditing mechanisms that can track and log user access to identify possible security breaches or suspicious activities.
Data Ownership
Data fabrics use AI to classify data and may use information from predictive analysis to generate the data catalog and knowledge graph. Knowledge graph analysis generates dataset insights that can be used to gain insights about individuals and their behavior. “Information generated by predictive analytics in big data sets is new information,”29 and it is unclear “within the current informational privacy frameworks who owns this information or has the right to it.”30 The notion that data is the new oil also implies the concept that until it is refined, data is not valuable. Data fabrics provide organizations with the tools to quickly refine data. “Humans are desirable data subjects”31 and the AI and ML in data fabrics may cause organizations to generate data that contains personal information security concerns. This data is certainly valuable to the organization and other entities, so determining who owns the data is paramount.
Data Usage
Data fabrics rely heavily on AI. The explainability and use of AI-generated models bring into question what is happening inside the black box.32 This concern results from the datafication process that reinterprets and statistically analyzes personal information.33 Fundamental human rights have been brought into question by the use of personal information since items people buy, services they seek, or websites they visit may contain valuable individualized marketing campaigns. The sheer scale of personal data that is potentially available using data fabrics therefore leads to the possible misuse of the data. One example could be a consumer’s purchase of items that are personal in nature. Data fabric AI could use this information to develop sales campaigns that clearly go beyond what would normally be expected if the sales information was not known. Organizations implementing data fabrics must therefore consider the limits AI will be allowed to reach when it comes to using personal data to make recommendations.
Conclusion
Data fabrics may represent the future of data warehousing and usage for many organizations due to the ability to identify and refine large quantities of data. However, the personal information security and privacy concerns associated with the use of data fabrics are considerable. Organizations choosing to implement data fabrics should look at who has access to the data, who owns the data, and how the data is being used to make decisions. Hopefully, research in each of these areas will continue to grow to provide organizations with a more in-depth understanding of how to protect personal information while still pushing the envelope in big data usage.
Endnotes
1 Arthur, C.; “Tech Giants May Be Huge, but Nothing Matches Big Data,” The Guardian, 23 August 2013, https://www.theguardian.com/technology/2013/aug/23/tech-giants-data
2 Yuhana, N.; Leganza, G.; et al.; “Big Data Fabric Drives Innovation and Growth,” Forrester, 8 March 2016, https://www.forrester.com/report/big-data-fabric-drives-innovation-and-growth/RES129473
3 Castelluccio, M.; “Data Fabric Architecture,” Strategic Finance, 1 October 2021, https://www.sfmagazine.com/articles/2021/october/data-fabric-architecture/
4 Rieyan, S., News, M.; et al.; “An Advanced Data Fabric Architecture Leveraging Homomorphic Encryption and Federated Learning,” Information Fusion, vol. 102, February 2024, https://doi.org/10.1016/j.inffus.2023.102004
5 Stonebraker, M.; Ilyas, I.F.; “Data Integration: The Current Status and the Way Forward,” IEEE Data Engineering Bulletin, 2018, http://sites.computer.org/debull/A18june/p3.pdf
6 Alpoim, Â.; Lopes, J.; et al.; “A Framework to Evaluate Big Data Fabric Tools,” In Integration Challenges for Analytics, Business Intelligence, and Data Mining, IGI Global, USA, 2020
7 Sense, “Enterprise Resource Planning (ERP),” https://6sense.com/tech/erp
8 Alvord, M.; Lu, F.; et al.; “Big Data Fabric Architecture: How Big Data and Data Management Frameworks Converge to Bring a New Generation of Competitive Advantage for Enterprises,” EAPJ, 14 November 2020, https://eapj.org/big-data-fabric-architecture/
9 Tittel, E.; Data Fabrics for Dummies, Hitachi Vantara, John Wiley & Sons, USA, 2022
10 Ibid.
11 Gupta, A.; “Data Fabric Architecture is Key to Modernizing Data Management and Integration,” Gartner, 11 May 2021, https://www.gartner.com/smarterwithgartner/data-fabric-architecture-is-key-to-modernizing-data-management-and-integration
12 Op cit Tittel
13 Ibid.
14 Ibid.
15 Ibid.
16 Op cit Gupta
17 Op cit Rieyan
18 IBM, “The Rise of a Financial Tiger,” https://www.ibm.com/case-studies/state-bank-of-india
19 IBM Hybrid Data Management, “Data Fabric Architecture Delivers Instant Benefits,” https://www.ibm.com/downloads/cas/V4QYOAPR
20 Op cit Rieyan
21 “General Data Protection Regulation,” Official Journal of the European Union, 27 April 2016, https://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:32016R0679
22 Whitman, M. E.; Mattord, H.; Principles of Information Security, Cengage Publishing, 2022
23 Michener, W. K.; Brunt, J.W.; et al.; “Nongeospatial Metadata for the Ecological Sciences,” Ecological Applications, February 1997
24 Op cit Gupta
25 IBM, “What Is a Data Fabric?,” https://www.ibm.com/topics/data-fabric
26 Wlosinski, L.; “Understanding and Managing the Artificial Intelligence Threat,” ISACA Journal, vol. 1, 2020, https://www.isaca.org/archives
27 Hur, J.; “Improving Security and Efficiency in Attribute-Based Data Sharing,” IEEE Transactions on Knowledge and Data Engineering, vol. 25, iss. 10, October 2013
28 Fisk, G.; Ardi, C.; et al.; “Privacy Principles for Sharing Cyber Security Data,” 2015 IEEE Security and Privacy Workshops, 2015
29 Mai, J.-E.; “Big Data Privacy: The Datafication of Personal Information,” The Information Society, vol. 32, iss. 3, 13 April 2016, https://www.tandfonline.com/doi/full/10.1080/01972243.2016.1153010
30 Ibid.
31 Pearce, G.; Ketchen, S.; “Whose Data Is It Anyway?,” ISACA Journal, vol. 2, 2020, https://www.isaca.org/archives
32 Aich, S.; Burch, G.; “Looking Inside the Magical Black Box: A Systems Theory Guide to Managing AI,” ISACA Journal, vol. 1, 2023, https://www.isaca.org/archives
33 Op cit Mai
NIKOS ORTH
Is an undergraduate student at Ludwigshafen University of Business and Society (Ludwigshafen, Germany) participating in a cooperative study program with SAP SE, a leading enterprise software company. He has gained practical experience through internships across multiple security departments including security operation centers. His research interests include enterprise security strategies and operations.
GERALD F. BURCH | PH.D.
Is an assistant professor at the University of Florida (Pensacola, Florida, USA). He teaches courses in information systems and business analytics at both the graduate and undergraduate levels. His research has been published in the ISACA® Journal and several other leading peer-reviewed journals. He has helped more than 100 enterprises with his strategic management consulting and can be reached at gburch@uwf.edu.