When searching for relationships between disparate sources of big data is key, graph databases could find your answers much faster.

When a major Japanese automobile manufacturer wanted to cut the time-to-market for new vehicles, it ran into a problem: engineers from diverse domains had been conducting tests differently and storing the test data in a variety of formats using different tools. This was causing inconsistent, siloed data that was useless to other teams.

By turning to graph data science, they firm managed to connect all product validation life cycle data and integrated the complex siloed domains and functions enterprise wide, defining key metadata such as test types, measurement characteristics, and measurement conditions. The resulting well-defined semantics for tests, subtests and measurements subsequently enabled the firm’s engineers to improve communication across domains and platforms and overcome problems and delays.

Nik Vora, Vice President, APAC, Neo+4j

Today, this problem is becoming more common as the exponential growth of Big Data has surpassed the point where traditional databases can manage it. Businesses are building vast repositories of data on their operations and their customers, with each entity having multiple points and layers of information. The problem is how to store, process and analyze relationships in the data in a meaningful and timely way.

The problem with legacy databases
Representing today’s customer databases in a two-dimensional table or spreadsheet is a very limited approach.

Data can be stored and queried, but finding patterns among thousands of rows and cells is not an easy or immediate process. It is extremely difficult to connect different areas of data: for example not just who your customer are, but what they bought, how they bought, where they bought and why they bought.

In the past few years, one of the biggest disruptions linked to legacy databases has been in supply chain disruptions. Unravelling the extremely complex web of routes and participants to try to re-route tens of thousands of container ships crossing the oceans every day has been an immensely challenging task, and the immense volume and detail of data generated by traditional databases have lacked real-time, accurate information processing capabilities.

So, when legacy data processing is hampering efficiency in sussing our relationships in data, how can data science be applied to reduce data bottlenecks?

Connecting the big data with graphs
When it comes to processing big data, there are multitudes of data science approach, each with its own strengths.

One approach, graph data science, is strong in sniffing out connections and relationships between billions and even trillions of data points through graphing—in order to aid prediction of future relationships and problems. 

For example, knowledge graphs are good for mapping complex, inter-connected supply chains and maintaining high performance with vast volumes of data.

The natively relationship-centric approach makes knowing graphing suitable for certain complex data analytics use cases where relational and other legacy databases cannot suffice. A graph database typically demonstrates 100 times faster query response speeds in contrast to a traditional SQL database. It lets the connected data ‘speak for itself’, such as running and unsupervised method of graph algorithm to find the signal in the noise.

A knowledge graph can identify chains and rings of linked individuals, scoring the quality, quantity and distance of one party’s relationships with suspicious entities. With a customer database, graphing could show how the community of customers interacts, which could be useful information for segmentation. Suspected connections between individuals and entities can become more visible and allow for much earlier intervention.

A predictive, not reactive approach
Once the kinds of patterns discerned by graphing are brought into the light, they can be used to predict a certain outcome and guide intervention.

For example, one national finance ministry is using graph data science to map around 150,000 people, firms and documents, as well as approximately 750,000 relationships between these entities—for fraud prevention and intervention.

If suspicious transactions are detected, they are analyzed together with all relevant information and documents in the graph. This is invaluable to legal experts because, instead of taking a superficial look at relationships, they can now uncover relationships only apparent at the second or third level, thanks to the graphing approach.