November 24, 2024

Data 2: Dataspaces 101 – Compendium

“Data is information’s ore”  —  Peter F. Drucker*

 

Insights from our research and referred publications on dataspace issues of management importance.

 

Introduction: From ‘Big Data’ to better data – without it, any AI is worthless, even dangerous
Dataspace is fundamentally focused on delivering not just ‘Big Data’ but better data, specifically the right data with relevant information content and quality, and all in the right quantity. Peter Drucker’s metaphor, “data is information’s ore,” illustrates the hierarchical progression from raw data to refined data for information extraction and, ultimately, insightful knowledge. In this analogy, raw data is akin to ore, undergoing processing to extract information. Analytics and artificial intelligence (AI) serve as tools for information extraction and insight generation. However, without the right information in the data, any tool, no matter how powerful, is rendered worthless. Generative AI, such as ChatGPT, exemplifies this principle: the quality of its outcomes is contingent on the quality of the data it has been trained on. Any GenAI is only as good as its data—without cats in the training data, it can’t infer cats; if only black cats are included, only black cats will be output. Therefore, dataspace holds strategic importance, serving as a key lever at the inception of any analytics or AI food chain.

 

What is it? Definition
“A dataspace is a peer-to-peer data communication system (think: phone system for data or data dial-tone network) and not a storage solution, which sits on top of cloud platforms, with the advantage of cross-organizational data sharing with built-in data sovereignty protection: two parties who may not trust each other fully can trust a data transaction, because a data provider retains power to control rights to its data at all times through …
(1) Verified authentication of users (who is involved?)
(2) Access control (who can see data offer?)
(3) Usage policies, which are specified by the provider and need to be accepted and signed by the consumer (what is allowed?)” (see Schlueter Langdon & Schweichhart 2022).
In short, it resembles a “container shipping system” for data. A container protects what is inside and works everywhere, on sea and land, and ports around the world. What you place in it and to whom you send it for what purpose is between you and the receiver:

  • Dataspace Top 10 characteristics, 2-pager: Link
  • Discussion of definitions of “dataspace” and “data sovereignty”: Schlueter Langdon, C. 2020. Dataspace, sovereignty, supermarket: IT Director interview. Blog Post (2020-08-31), Deutsche Telekom IoT, Bonn, link; based on: Hoffmann, D. 2020. Freie Bahn für den Datenaustausch (A clear path for data exchange). IT Director (2020-08-31), link

 

Why important?
Dataspaces serve as critical infrastructure for digital transformation success, enabling:
(a) ecosystem formation for faster, more agile value creation, and resilient supply chains,
(b) data productization and industrialization, including digital twins, and
(c) advanced automation through generative AI and super-apps.

  • Business ecosystems: A dynamic network of interconnected organizations, individuals, and other stakeholders that jointly contribute to the creation and delivery of value in a particular industry or market. Central to this concept is the recognition that value creation is not achievable in isolation, and the outcomes can be synergistic, surpassing the sum of its parts (1 + 1 = 3; see “Biz Ecosystems 2.0: Built on Data“)
  • Data products: A data product is refined and ready-to-use data accessible to various software applications. Examples include digital twins (see “Data products – Compendium“)
  • Gen AI and super-apps: A software application “that is much better than all other of its type” (Cambridge Dictionary), because it delivers a leap in user value and a seamless experience due to better data, and presents a “blue ocean” opportunity, a yet unexploited or uncontested marketspace (Kim & Mauborgne 2004). Chinese WeChat has been widely recognized as a first super-app (The race to create the world’s next super-app, BBC News, 2021-02-05; see “Super-apps – Compendium“)

 

Case studies – with published results
“The proof is in the pudding”: nothing demonstrates value better than the impact shown – and measured – through our own, real-world case studies.

  • Industry & mobility (including Umati – Universal machine technology interface, link): Schlueter Langdon, C., and K. Schweichhart. 2022. Dataspaces: First Applications in Mobility and Industry. In: Otto, B. et al. (eds.). Dataspaces – Part IV Solutions & Applications. Springer Nature, Switzerland: 493 – 511, link
  • RealLab Hamburg mobility super-app, see “Auto 5: Mobility super-app disruptions” (RealLab Hamburg was winner of the 2022 “Innovation Lab Award” by the German Federal Ministry of Economic Affairs and Climate Action, link)
  • Catena-X**: First collaborative, open data ecosystem for the automotive industry, see “Data Move People

 

How to participate?
There are at least three roles of dataspace users.

  1. Use case owner: Automating a use case requires a software application (app) and relevant data input, which needs to be refined from raw data. A dataspace, in this context, emerges as a valuable resource, supplying both novel raw data and data products or refined data tailored to a specific application. For example, a data consumer (e.g., a supply chain software app like product carbon footprint or CO2 tracking) could pull data on-demand and near just-in-time from a data provider for a specific data product, a digital twin, for example.
  2. Data provider: Participants may provide their own data product or be asked to provide values for the variables of a data product template, such as a digital twin (a tier 1 supplier providing values for an OEM customer’s vehicle digital twin, for example). Merriam-Webster defines “product” as something that “is marketed or sold as a commodity” and which is “subject to ready exchange or exploitation within a market”. Therefore, a data product is like a food product defined by its information content (food ingredients), quality (nutritional value) and quantity, and ready to use for many software applications (recipes) (see Schlueter Langdon & Sikora 2020).
  3. App developer – “there’s an app for that”: New data, some of it available for the first time, will spark a rich application ecosystem, something witnessed before, for example, with the emergence of smart phones and social media. In addition, existing apps, such as a company’s capacity management system, may need to be updated in order to benefit from the better data provided in a dataspace. Our ‘Case study: RealLab Hamburg’ (in ‘Use cases’ below) exemplifies the necessity of adapting current applications to harness the advantages of better data available from connectivity with a dataspace.

 

How to get ready?

  • C-level 1-pager: Link
  • 3 steps for strategy: (1) Treat data as a product applying product management best practice, (2) industrialize data products using data factory automation, (3) orchestrate a data supply chain for these factories using dataspaces. Schlueter Langdon, C., and C. Hort., 2022. How data sovereignty enables the next future of automotive – part 1. White Paper (2022-04-15), T-Systems International, Frankfurt, link (for data products, see “Data products: Digital twins“)
  • 3 steps for a pilot and operations: (1) Use dataspace to source data/ create data chain, (2) refine data products (for a digital twin, for example), (3) create super-app. Schlueter Langdon, C., and C. Hort. 2023. Winning with dataspaces like Catena-X: From Big Data to Better Data – part 2. White Paper (2023-04-18), T-Systems International, Frankfurt, link

 

Dataspace tech and interoperability – our own published R&D
We are involved in scientific research with experts from leading institutions including Fraunhofer Institutes (FhG) and the German Aerospace Center (DLR).

  • Staebler, M., T. Mueller, F. Koester, and C. Schlueter Langdon, C. 2024. Why an Automated, Scalable and Resilient Service for Semantic Interoperability is Needed. In: Proceedings of the 16th International Conference on Agents and Artificial Intelligence (ICAART, Volume 3), Rome, ISBN 978-989-758-680-4, ISSN 2184-433X, SciTePress, pages 299-307, DOI: 10.5220/0012345000003636, link
  • Staebler, M., F. Koester, and C. Schlueter Langdon. 2023. Towards solving ontological dissonance using network graphs. Proceedings of 29th Americas Conference on Information Systems (AMCIS), Panama, link
  • Drees, H., S. Pretzsch, B. Heinke, D. Wang, and C. Schlueter Langdon. 2022. Dataspace Mesh: Interoperability of Mobility Dataspaces. Technical Paper ID 280, 14th ITS Europe Congress, Toulouse, link
  • Lauf, F., S. Scheider, J. Bartsch, P. Herrmann, M. Radic, M. Rebbert, A. T. Nemat, C. Schlueter Langdon, R. Konrad, A. Sunyaev, and S. Meister. 2022. Linking Data Sovereignty and Data Economy: Arising Areas of Tension. Best Paper Award at the 17th International Conference on Wirtschaftsinformatik (WI22), link
  • Drees, H., D. O. Kubitza, J. Lipp, S. Pretzsch, and C. Schlueter Langdon. 2021. Mobility Dataspace – First Implementation and Business Opportunities. Technical Paper ID 909, 27th ITS World Congress, Hamburg, link

 

* Peter Drucker. 1992. Be Data Literate – Know What to Know. The Wall Street Journal (1992-12-03), link
** Professor Chris Schlueter Langdon is SAFe PM-PO certified (link) and one of three Agile Product Managers responsible for the Catena-X software release made available as free and open-source (FOSS) software under the Eclipse Foundation in the Tractus-X project, link

Categories

Share