In a prior blog, we pointed out that warehouses, known for high-performance data processing for business intelligence, can quickly become expensive for new data and evolving workloads. We also made the case that query and reporting, provided by big data engines such as Presto, need to work with the Spark infrastructure framework to support advanced analytics and complex enterprise data decision-making. To do so, Presto and Spark need to readily work with existing and modern data warehouse infrastructures. Now, let’s chat about why data warehouse optimization is a key value of a data lakehouse strategy.

Read our blog on solving today’s challenges with a lakehouse architecture

Value of data warehouse optimization

Since its introduction over a century ago, the gasoline-powered engine has remained largely unchanged. It’s simply been adapted over time to accommodate modern demands such as pollution controls, air conditioning and power steering.

Similarly, the relational database has been the foundation for data warehousing for as long as data warehousing has been around. Relational databases were adapted to accommodate the demands of new workloads, such as the data engineering tasks associated with structured and semi-structured data, and for building machine learning models.

Returning to the analogy, there have been significant changes to how we power cars. We now have gasoline-powered engines, battery electric vehicles (BEVs), and hybrid vehicles. An August 2021 Forbes article referenced a 2021 Department of Energy Argonne National Laboratory publication indicating, “Hybrid electric vehicles (think: Prius) had the lowest total 15-year per-mile cost of driving in the Small SUV category beating BEVs”.

Just as hybrid vehicles help their owners balance the initial purchase price and cost over time, enterprises are attempting to find a balance between high performance and cost-effectiveness for their data and analytics ecosystem. Essentially, they want to run the right workloads in the right environment without having to copy datasets excessively.

Optimizing your data lakehouse architecture

Fortunately, the IT landscape is changing thanks to a mix of cloud platforms, open source and traditional software vendors. The rise of cloud object storage has driven the cost of data storage down. Open-data file formats have evolved to support data sharing across multiple data engines, like Presto, Spark and others. Intelligent data caching is improving the performance of data lakehouse infrastructures.

All these innovations are being adapted by software vendors and accepted by their customers. So, what does this mean from a practical perspective? What can enterprises do different from what they are already doing today? Some use case examples will help. To effectively use raw data, it often needs to be curated within a data warehouse. Semi-structured data needs to be reformatted and transformed to be loaded into tables. And ML processes consume an abundance of capacity to build models.

Organizations running these workloads in their data warehouse environment today are paying a high run rate for engineering tasks that add no additional value or insight. Only the outputs from these data-driven models allow an organization to derive additional value. If organizations could execute these engineering tasks at a lower run rate in a data lakehouse while making the transformed data available to both the lakehouse and warehouse via open formats, they could deliver the same output value with low-cost processing.

Benefits of optimizing across your data warehouse and data lakehouse

Optimizing workloads across a data warehouse and a data lakehouse by sharing data using open formats can reduce costs and complexity. This helps organizations drive a better return on their data strategy and analytics investments while also helping to deliver better data governance and security.

And just as a hybrid car allows car owners to get greater value from their car investment, optimizing workloads across a data warehouse and data lakehouse will allow organizations to get greater value from their data analytics ecosystem.

Discover how you can optimize your data warehouse to scale analytics and artificial intelligence (AI) workloads with a data lakehouse strategy.

Chat with a data management expert
Was this article helpful?
YesNo

More from Analytics

How the Recording Academy uses IBM watsonx to enhance the fan experience at the GRAMMYs®

3 min read - Through the GRAMMYs®, the Recording Academy® seeks to recognize excellence in the recording arts and sciences and ensure that music remains an indelible part of our culture. When the world’s top recording stars cross the red carpet at the 66th Annual GRAMMY Awards, IBM will be there once again. This year, the business challenge facing the GRAMMYs paralleled those of other iconic cultural sports and entertainment events: in today’s highly fragmented media landscape, creating cultural impact means driving captivating content…

How data stores and governance impact your AI initiatives

6 min read - Organizations with a firm grasp on how, where, and when to use artificial intelligence (AI) can take advantage of any number of AI-based capabilities such as: Content generation Task automation Code creation Large-scale classification Summarization of dense and/or complex documents Information extraction IT security optimization Be it healthcare, hospitality, finance, or manufacturing, the beneficial use cases of AI are virtually limitless in every industry. But the implementation of AI is only one piece of the puzzle. The tasks behind efficient,…

IBM and ESPN use AI models built with watsonx to transform fantasy football data into insight

4 min read - If you play fantasy football, you are no stranger to data-driven decision-making. Every week during football season, an estimated 60 million Americans pore over player statistics, point projections and trade proposals, looking for those elusive insights to guide their roster decisions and lead them to victory. But numbers only tell half the story. For the past seven years, ESPN has worked closely with IBM to help tell the whole tale. And this year, ESPN Fantasy Football is using AI models…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters