Hello Dataspace: From Big Data to Better Data
The problem: Bad data = bad economics
Data analytics and artificial intelligence are suffering from a productivity crisis. And data is the core problem. Even the best software won’t work without the right data. According to meta research more than 80% of the time budget of a data analytics project is spent on data wrangling instead of results, often like looking for a needle in a haystack (Schlueter Langdon 2019). This turns the 80/20 Pareto principle, a cornerstone of business efficiency (e.g., Neuman 2005), upside down.
The solution: Dataspace = better data
“Are you still storing, or are you already sharing?” One solution is data sharing to ensure that the right data with the required information content is available. Furthermore, don’t store everything – which is expensive – but let a software app pull data from the source when it is needed. Treat data like modern logistics: allow for on-demand and just-in-time delivery. So far, such data sharing has been impeded by a lack of data sovereignty: How to protect your rights to data when you share it? Dataspace tech has been explicitly designed to solve this problem. It facilitates data sharing with data sovereignty protection. With Catena-X the technology has left the lab and arrived in industry driven by leading automakers and tier 1 suppliers (link; first use cases in Schlueter Langdon & Schweichhart 2022).
What is a dataspace?
A dataspace is a data communication system or data dial-tone network (think: phone system for data), not for storage, and it sits on top of cloud platforms. It enables peer-to-peer data transactions initiated by a data consumer (e.g., a supply chain software app like product carbon footprint or CO2 tracking) to pull data on-demand from a data provider for a specific data product, a digital twin, for example (see Schlueter Langdon & Sikora 2020) with built-in data sovereignty protection: two parties that do not trust each other can trust a data transaction. The provider of a data asset retains power to control rights to its data through (a) verified authentication of users (who is involved?), (b) access control (who can see data offers?), and (c) usage policies specified by the provider, and to be agreed and signed by the consumer (what is allowed?). For a C-level description, please read:
Schlueter Langdon, C. 2023. From Big Data to Better Data – Dataspace Top 10. In: Mertens, C., et al. (eds.). Data Move People, Anthology (version 2.0, January), International Data Spaces Association, Berlin (forthcoming)
Our “Data Analytics Innovation” miniseries
This article continues our series on data analytics innovation. Previous episodes include:
- Shift to behavioural variables: Behavioural Analytics: Auto Interior & UX (link, 2016)
- Shift to personalized recommendations: Technology Personified – AMA (link, 2014)
References
Newman, M. E. 2005. Power laws, Pareto Distributions, and Zipf’s law. Contemporary Physics 46(5): 323–351
Schlueter Langdon, C. 2019. Data is broken: The data productivity crisis. Telekom Data Intelligence Hub Blog Story, T-Systems International, Frankfurt, link
Schlueter Langdon, C., and K. Schweichhart. 2022. Data Spaces: First Applications in Mobility and Industry. In: Otto B., et al. (eds.). Data Spaces – Part IV Solutions & Applications. Springer Nature, Switzerland: 493-511, link
Schlueter Langdon, C., and R. Sikora. 2020. Creating a Data Factory for Data Products. In: Lang, K. R., J. J. Xu et al. (eds). Smart Business: Technology and Data Enabled Innovative Business Models and Practices. Springer Nature, Switzerland: 43-55, link
Share