While data at rest is given all the attention, data in motion holds rich intelligence worth tapping, according to this writer.
The information captured in databases, data warehouses, and even data lakes represents only a small fraction of an enterprise’s total data.
While organizations have significantly sharpened their abilities at creating a relatively complete view of their so-called “data at rest” through data warehousing, data integration, and analytics platforms, “data in motion” represents the series of activities, actions, and events happening in real-time all across the enterprise that ultimately culminate in and shape the data at rest throughout the enterprise.
This article examines the value that can be derived from data in motion, the challenges that arise during the harnessing process, and some considerations for organizations that aim to benefit from their data in motion.
Usefulness of data in motion
Examples of the usefulness of data in motion exist across industries. Financial institutions have found that data in motion has been key to radical improvements in fraud detection. Retailers have found data in motion has been key to better product recommendations that have drastically increased the average size of an online order. These may be familiar examples, but we have only scratched the surface of the potential this data has to offer.
In the technology landscape within a typical enterprise, the majority of data in motion is never captured—or it is captured in legacy message-oriented middleware systems like Java messaging service (JMS) or MQ-based platforms. These platforms were built on a fundamental premise that the data they processed was transient and should be immediately discarded once each message had been delivered.
Data in motion is usually siloed within these platforms, causing a lack of awareness throughout the organization about what data is available, let alone how to access it. This problem is compounded by the adoption of products tied to specific cloud providers that have the same limitations as traditional messaging products along with the disadvantage of being anchored to a particular cloud.
Through this lens, the challenges enterprises have with data in motion today become clearer; they mirror the problems that we have faced with data at rest over the last two decades: fragmented data, trapped in silos, represented inconsistently with unclear sources of truth. Very few enterprises treat data in motion with the same care and consideration that they treat data at rest.
Harnessing data in motion
Most enterprises that recognize the shortcomings of their data in motion capabilities have made small tactical improvements such as augmenting their legacy messaging platforms with a similar solution that does a better job of persisting message data, such as Apache Kafka, an event streaming platform.
But these organizations generally find that they have simply made temporary improvements to one small area of their business, in exchange for ultimately compounding the overall complexity of their data in motion strategy.
For a specific use case, they have achieved a minor, incremental advancement around the completeness of their data understanding, but they have not solved the fundamental problem.
To make matters worse, they have paid a high cost for this limited improvement through a mix of higher CAPEX and OPEX associated with the addition of yet another new messaging platform.
A unified fabric for data in motion
Increasingly, organizations are in search of a more foundational solution, one that offers a unified fabric for data in motion that stretches across the entire enterprise, both on premises and across clouds. They are making streaming data available everywhere to any application in their organization with a replication strategy to achieve consistency, low latency, and high performance.
These organizations are often looking to more comprehensive, next-generation platforms that provide a range of capabilities for data in motion including queuing, pub/sub, streaming, and stream processing. Ultimately, organizations that choose a broader, strategic approach over a more tactical approach will be better positioned to extract value from their data in motion.
Regardless of the technology platform an organization chooses, the steps required to manage data in motion look very similar to what is needed to solve the same kind of problems faced with data at rest. Specifically, the following aspects of data management must be addressed:
- Persistence: There was a time when just storing data was not taken for granted the way it is today. Because so much data in motion is not captured today—and that which is captured, is not persisted—having a persistence strategy is a critical first step.
- A source of truth: Simply storing data is generally not sufficient because we want to ensure that we understand where the source of truth resides for individual sets of events within the enterprise.
- Centralized master data: While we need to know which system provides the source of truth, we also need to have a centralized view that ties related data in motion together into a complete view across the enterprise.
- Operational data views: Just like data at rest, applications often need to have up-to-date, use case-specific views of data that are located in close proximity to the end user or downstream system to optimize for performance. Sometimes this is as simple as offering pub/sub message semantics. Other times it requires low-latency replication to anywhere in your organization; on-prem, cloud, and even to the edge.
- Data governance: We have seen the widespread problems that stem from unreliable and inconsistent data representations for data at rest; these same problems exist for data in motion and require an analogous governance strategy.
Getting started does not need a massive, multi-year effort. A growing number of products are available to simplify the management of data in motion, often with features that provide drop-in replacements for existing, aging message-oriented middleware. For more proactive organizations, many of these capabilities can be found by assembling a set of open source technologies and building bespoke solutions to address the gaps and specific needs of your organization.
The specific technologies that make up your strategy are less important than the fact that you have a strategy for your data in motion. Without one, you run the risk of watching your competitors build data-driven innovations while you scramble to play catch up.