Operationalizing data fabric: Through the lens of trading platforms

Take a second to think about all the ways data has changed in the last 20 years. In the hardware space, our mobile phones started out as large handhelds with pull-out antennas and limited processing power. Now they are advanced pieces of technology with a computational power 32,600 times faster than the computers we used to reach the moon. The transformation in our phones is analogous to the evolution of the modern data architecture for enterprises. As front-end consumer applications have evolved, the number of resources needed to collect, store, and analyze the information flowing from consumers has grown. The average company has 110 SaaS applications, providing connections to an average of 400 data sources. To scale with this expansion, companies like IBM have proposed a new architectural approach known as “data fabric” that provides a unified platform to connect this growing number of applications. A data fabric can be thought of as what the name implies — a “fabric that connects data from multiple locations, types, and sources to facilitate data growth and management. IBM delivers this flexible architecture through Cloud Pak® for Data, an enterprise insights platform that provides the flexibility for companies to scale across any infrastructure using the world’s leading open-source orchestrator, Red Hat.

I will outline the data fabric architectural approach through the lens of a basic stock trading platform. Consumer-oriented trading platforms have gained traction over the past couple years, enabling users to drive their own financial destiny. But to bring the individual investor this power, there must be a strong data architecture in place to connect the live price feeds and analytics to advanced backend systems. Data virtualization facilitates this movement by working behind the scenes to unify multiple disparate systems.

Data virtualization

Data virtualization integrates data sources across multiple locations (on-prem, cloud or hybrid) and returns a logical view without the need for data movement or replication. The real value of data virtualization is that it creates a centralized data platform without large data movement cost. In terms of our stock trading platform, we have customer data, financial trading data and account data in separate storage locations.

Figure 1

As evidenced in Figure 1, financial data is located in a PostgreSQL cloud environment, while personal customer data is on premise in the respective MongoDB and Informix environments. Using our advance virtualization engine, you can query each of these sources together and save half the cost of traditional information extraction methods.

Data cataloging

Once this data is ingested, it needs a mechanism to curate, categorize and facilitate its sharing throughout an organization. For example, our stock trading platform may have multiple teams of data scientists focused on core customer initiatives such as UI optimization algorithms or understanding order flow. A data catalog, such as the IBM Watson® Knowledge Catalog, can facilitate the relationship between these roles and reduce the prep necessary to complete these tasks. Data catalogs bridge the gap between raw and useable data, allowing for the application of business context, data policies and data protection rules to your virtual data. For example, if the lead data steward at my trading platform wishes to censor credit card numbers as they flow to different data projects, I can apply a data protection rule on credit card numbers as shown in Figure 2:

Figure 2

Now, you have credit card numbers censored throughout your environment, improving trust in your company while also enhancing your ability to meet different government regulations.

With this rule applied, data scientists who view customer information see redacted credit card numbers as shown in Figure 3:

Figure 3

Now if this table is needed in a Python project, data scientists can export that same core data for analysis without seeing any confidential information, as shown in Figure 4:

Figure 4

This is how a data fabric architecture enables our trading platform to virtualize sources and access data across multiple environments, then organize this data and safely collaborate with key data personnel. If you’re curious as to how this demo was made and would like to see how our final trading platform effectively analyzes data, sign up for my 15 Minute Friday Session on July 8th in the form below.

Was this article helpful?

YesNo

Data virtualization

Data cataloging

Tags

More from Cloud

Prioritizing operational resiliency to reduce downtime in payments

Agility, flexibility and security: The value of cloud in HPC

Field programmable gate arrays (FPGAs) vs. microcontrollers: What’s the difference?

IBM Newsletters