Creating Data Value With a Decentralized Data Strategy

BrandPost By Beth Stackpole
Apr 06, 2022
Edge Computing

Garnering data-driven insights isn’t just about capturing and analyzing data from any single edge location.

Edge computing
Credit: iStock/Funtap

For decades organizations chased the Holy Grail of a centralized data warehouse/lake strategy to support business intelligence and advanced analytics. Now, with processing power built out at the edge and with mounting demand for real-time insights, organizations are using decentralized data strategies to drive value and realize business outcomes.

The proliferation of data at the edge is quickening, whether that data is collected from a retail store customer interaction, a mobile phone transaction, or industrial equipment on the plant floor. Improved connectivity, including increased availability of 5G capabilities, coupled with cost-effective edge processing power, is driving the deluge of data that exists outside centralized repositories and traditional data centers.

According to IDC estimates, there will be 55.7 billion connected Internet of Things (IoT) devices by 2025, generating almost 80 billion zettabytes of data at the edge. At the same time, IDC projects, worldwide spending on edge computing will reach $176 billion in 2022, an increase of 14.8% over 2021.

But garnering data-driven insights isn’t about capturing and analyzing data from any single edge location. Imagine collecting data from thousands of retail stores or processing data from connected cars. Each involves challenges in collecting, storing, managing, and analyzing data in a way that is scalable and delivers real business value from specific, actionable insights.

“The intelligence being pushed to the edge is about driving a decision point — convincing someone to buy something or providing a customer experience in that moment,” explains Matt Maccaux, field chief technology officer for the HPE GreenLake Cloud Services Group. “Thinking about that intelligence as having millions of loosely connected decision points at the edge requires a different strategy, and you can’t micromanage it. You have to automate it. You have to use sophisticated algorithms and machine learning to make those decisions in those moments.”

That’s not to say that a decentralized data strategy wholly replaces the more traditional centralized data initiative — Maccaux emphasizes that there is a need for both. For example, a lot of data is centralized by default or needs to remain so because of compliance and regulatory concerns. In addition, for certain artificial intelligence (AI) and machine learning (ML) workloads, a centralized strategy makes sense; it can be a more efficient way of storing and processing the entire spectrum of data necessary to make the edge more intelligent to drive actionable insights.

“A centralized data strategy is really good at building those sophisticated models against massive data sets … and working to make the edge more intelligent or when latency isn’t an issue,” Maccaux says. “Modern enterprises have to adopt a dual strategy.”

Challenges of a distributed enterprise data estate

The biggest challenge with a decentralized data strategy is managing data across the sheer number of decentralized or edge-based endpoints. For example, a single retail store can code and consume data by using human manpower, but as that environment scales to dozens, hundreds, thousands, or even millions of connected points, that order of magnitude of scale and growth becomes daunting.

There is also the likelihood that all of those individual edge environments handle data differently to accommodate different use cases and different environmental and demographic factors. Allowing for scale and flexibility without unique configurations requires automation. “We need to be able to handle that massive scale — that’s the challenge when dealing with decentralized intelligence,” Maccaux says.

Although connectivity and processing power have grown significantly at the edge, it’s still not as powerful and fast as most data center environments. So IT organizations have to spend time thinking about applications, data movement, and algorithmic processing, based on the footprint and connectivity available at the edge. In addition, distributed queries and analytics are highly complex and often fragile, which can make it difficult to ensure that the right data is identified and available to drive insights and action.

When building out a decentralized data strategy, Maccaux recommends the following:

  • Architect for scale for your order-of-magnitude level of growth from the beginning if you want to scale properly without having to constantly refactor.
  • Know what’s practical and what’s possible in terms of connectivity and other factors when designing edge-based locations.
  • Leverage a data fabric to support a unified data strategy, which will make deployments and maintenance easier. “It’s going to drive compliance, ensure governance, and increase productivity regardless of the tools that these distributed analytics users are using.”

The HPE GreenLake advantage for distributed data strategy

With users relying on different data sources and tools, organizations struggle with how to keep data in sync between all the edge points while still adhering to data sovereignty, data governance, and regulatory requirements. The HPE Ezmeral Data Fabric, delivered through the HPE GreenLake edge-to-cloud platform, unifies and syncs the movement of data globally. It provides policy-driven access to analytics teams and data scientists, regardless of whether data is at the edge, in an enterprise data warehouse, on-premises, or in a cloud data lake.

HPE Ezmeral Unified Analytics and HPE Ezmeral ML Ops, also available as cloud services through HPE GreenLake, deliver unified hybrid analytics that can handle the diversity of data types and spans from edge to hybrid cloud along with automation for building end-to-end AI/analytics pipelines. HPE GreenLake automates the provisioning of all these instances and provides visibility into cloud costs and controls, available as outcome-driven services enforceable through a service-level agreement (SLA). “Data fabric is the technology that enables it, but HPE GreenLake is the delivery mechanism for hitting the intended business outcomes,” Maccaux says. “We are automating all the way up the stack to make sure we are meeting business SLAs.”

Click here to learn more about HPE GreenLake.