Take a second to think about all the ways data has changed in the last 20 years. In the hardware space, our mobile phones started out as large handhelds with pull-out antennas and limited processing power. Now they are advanced pieces of technology with a computational power 32,600 times faster than the computers we used to reach the moon. The transformation in our phones is analogous to the evolution of the modern data architecture for enterprises. As front-end consumer applications have evolved, the number of resources needed to collect, store, and analyze the information flowing from consumers has grown. The average company has 110 SaaS applications, providing connections to an average of 400 data sources. To scale with this expansion, companies like IBM have proposed a new architectural approach known as “data fabric” that provides a unified platform to connect this growing number of applications. A data fabric can be thought of as what the name implies — a “fabric that connects data from multiple locations, types, and sources to facilitate data growth and management. IBM delivers this flexible architecture through Cloud Pak® for Data, an enterprise insights platform that provides the flexibility for companies to scale across any infrastructure using the world’s leading open-source orchestrator, Red Hat.

I will outline the data fabric architectural approach through the lens of a basic stock trading platform. Consumer-oriented trading platforms have gained traction over the past couple years, enabling users to drive their own financial destiny. But to bring the individual investor this power, there must be a strong data architecture in place to connect the live price feeds and analytics to advanced backend systems. Data virtualization facilitates this movement by working behind the scenes to unify multiple disparate systems.

Data virtualization

Data virtualization integrates data sources across multiple locations (on-prem, cloud or hybrid) and returns a logical view without the need for data movement or replication. The real value of data virtualization is that it creates a centralized data platform without large data movement cost. In terms of our stock trading platform, we have customer data, financial trading data and account data in separate storage locations.

Figure 1

As evidenced in Figure 1, financial data is located in a PostgreSQL cloud environment, while personal customer data is on premise in the respective MongoDB and Informix environments. Using our advance virtualization engine, you can query each of these sources together and save half the cost of traditional information extraction methods.

Data cataloging

Once this data is ingested, it needs a mechanism to curate, categorize and facilitate its sharing throughout an organization. For example, our stock trading platform may have multiple teams of data scientists focused on core customer initiatives such as UI optimization algorithms or understanding order flow. A data catalog, such as the IBM Watson® Knowledge Catalog, can facilitate the relationship between these roles and reduce the prep necessary to complete these tasks. Data catalogs bridge the gap between raw and useable data, allowing for the application of business context, data policies and data protection rules to your virtual data. For example, if the lead data steward at my trading platform wishes to censor credit card numbers as they flow to different data projects, I can apply a data protection rule on credit card numbers as shown in Figure 2:

Figure 2

Now, you have credit card numbers censored throughout your environment, improving trust in your company while also enhancing your ability to meet different government regulations.

With this rule applied, data scientists who view customer information see redacted credit card numbers as shown in Figure 3:

Figure 3

Now if this table is needed in a Python project, data scientists can export that same core data for analysis without seeing any confidential information, as shown in Figure 4:

Figure 4

This is how a data fabric architecture enables our trading platform to virtualize sources and access data across multiple environments, then organize this data and safely collaborate with key data personnel. If you’re curious as to how this demo was made and would like to see how our final trading platform effectively analyzes data, sign up for my 15 Minute Friday Session on July 8th in the form below.

Was this article helpful?
YesNo

More from Cloud

Enhance your data security posture with a no-code approach to application-level encryption

4 min read - Data is the lifeblood of every organization. As your organization’s data footprint expands across the clouds and between your own business lines to drive value, it is essential to secure data at all stages of the cloud adoption and throughout the data lifecycle. While there are different mechanisms available to encrypt data throughout its lifecycle (in transit, at rest and in use), application-level encryption (ALE) provides an additional layer of protection by encrypting data at its source. ALE can enhance…

Attention new clients: exciting financial incentives for VMware Cloud Foundation on IBM Cloud

4 min read - New client specials: Get up to 50% off when you commit to a 1- or 3-year term contract on new VCF-as-a-Service offerings, plus an additional value of up to USD 200K in credits through 30 June 2025 when you migrate your VMware workloads to IBM Cloud®.1 Low starting prices: On-demand VCF-as-a-Service deployments begin under USD 200 per month.2 The IBM Cloud benefit: See the potential for a 201%3 return on investment (ROI) over 3 years with reduced downtime, cost and…

The history of the central processing unit (CPU)

10 min read - The central processing unit (CPU) is the computer’s brain. It handles the assignment and processing of tasks, in addition to functions that make a computer run. There’s no way to overstate the importance of the CPU to computing. Virtually all computer systems contain, at the least, some type of basic CPU. Regardless of whether they’re used in personal computers (PCs), laptops, tablets, smartphones or even in supercomputers whose output is so strong it must be measured in floating-point operations per…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters