Data Lake, Interactive and Optimization

How to modernize data lakes with a data lakehouse architecture

IBM Big Data Hub

JULY 5, 2023

Data Lakes have been around for well over a decade now, supporting the analytic operations of some of the largest world corporations. Such data volumes are not easy to move, migrate or modernize. The challenges of a monolithic data lake architecture Data lakes are, at a high level, single repositories of data at scale.

Data Lake

Data Lake Metadata Cost-Benefit Data Warehouse

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

AWS Big Data

SEPTEMBER 13, 2023

A modern data architecture is an evolutionary architecture pattern designed to integrate a data lake, data warehouse, and purpose-built stores with a unified governance model. Moreover, the framework should consume compute resources as optimally as possible per the size of the operational tables.

Data Lake

Data Lake Data Processing Metadata Snapshot

Top 5 Tools for Building an Interactive Analytics App

Smart Data Collective

OCTOBER 27, 2021

An interactive analytics application gives users the ability to run complex queries across complex data landscapes in real-time: thus, the basis of its appeal. Interactive analytics applications present vast volumes of unstructured data at scale to provide instant insights. Why Use an Interactive Analytics Application?

Interactive

Interactive Unstructured Data Analytics Data Warehouse

Webinars

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

How To Get Promoted In Product Management

MORE WEBINARS

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

AWS Big Data

MARCH 10, 2023

Since the deluge of big data over a decade ago, many organizations have learned to build applications to process and analyze petabytes of data. Data lakes have served as a central repository to store structured and unstructured data at any scale and in various formats.

Data Lake

Data Lake Sales Data Warehouse Snapshot

Enable business users to analyze large datasets in your data lake with Amazon QuickSight

AWS Big Data

JUNE 23, 2023

Events and many other security data types are stored in Imperva’s Threat Research Multi-Region data lake. Imperva harnesses data to improve their business outcomes. As part of their solution, they are using Amazon QuickSight to unlock insights from their data.

Data Lake

Data Lake Cost-Benefit Dashboards Data Warehouse

Speed up queries with the cost-based optimizer in Amazon Athena

AWS Big Data

NOVEMBER 17, 2023

Amazon Athena is a serverless, interactive analytics service built on open source frameworks, supporting open table file formats. Athena provides a simplified, flexible way to analyze petabytes of data where it lives. Let’s discuss some of the cost-based optimization techniques that contributed to improved query performance.

Optimization

Optimization Statistics Metadata Data Lake

How Ruparupa gained updated insights with an Amazon S3 data lake, AWS Glue, Apache Hudi, and Amazon QuickSight

AWS Big Data

FEBRUARY 22, 2023

In this post, we show how Ruparupa implemented an incrementally updated data lake to get insights into their business using Amazon Simple Storage Service (Amazon S3), AWS Glue , Apache Hudi , and Amazon QuickSight. An AWS Glue ETL job, using the Apache Hudi connector, updates the S3 data lake hourly with incremental data.

Data Lake

Data Lake Dashboards Cost-Benefit Metadata

Data Lakes: What Are They and Who Needs Them?

Jet Global

JULY 2, 2019

To address the flood of data and the needs of enterprise businesses to store, sort, and analyze that data, a new storage solution has evolved: the data lake. What’s in a Data Lake? Data warehouses do a great job of standardizing data from disparate sources for analysis. Taking a Dip.

Data Lake

Data Lake Data Warehouse Big Data Machine Learning

How Amazon Devices scaled and optimized real-time demand and supply forecasts using serverless analytics

AWS Big Data

FEBRUARY 1, 2023

With data volumes exhibiting a double-digit percentage growth rate year on year and the COVID pandemic disrupting global logistics in 2021, it became more critical to scale and generate near-real-time data. This introduces the need for both polling and pushing the data to access and analyze in near-real time.

Optimization

Optimization Forecasting Data Lake Metadata

Accelerate data science feature engineering on transactional data lakes using Amazon Athena with Apache Iceberg

AWS Big Data

JUNE 20, 2023

Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon Simple Storage Service (Amazon S3) and data sources residing in AWS, on-premises, or other cloud systems using SQL or Python. Solution overview Data scientists are generally accustomed to working with large datasets.

Data Lake

Data Lake Data Science Recreation/Entertainment Experimentation

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

AWS Big Data

NOVEMBER 29, 2023

We have seen a strong customer demand to expand its scope to cloud-based data lakes because data lakes are increasingly the enterprise solution for large-scale data initiatives due to their power and capabilities. The team uses dbt-glue to build a transformed gold model optimized for business intelligence (BI).

Data Lake

Data Lake Management Metrics Data Warehouse

Simplifying data processing at Capitec with Amazon Redshift integration for Apache Spark

AWS Big Data

NOVEMBER 10, 2023

Your applications can seamlessly read from and write to your Amazon Redshift data warehouse while maintaining optimal performance and transactional consistency. Additionally, you’ll benefit from performance improvements through pushdown optimizations, further enhancing the efficiency of your operations. options(**read_config).option("query",

Data Processing

Data Processing Data Lake Data Warehouse Optimization

How smava makes loans transparent and affordable using Amazon Redshift Serverless

AWS Big Data

DECEMBER 21, 2023

To create and manage the data products, smava uses Amazon Redshift , a cloud data warehouse. In this post, we show how smava optimized their data platform by using Amazon Redshift Serverless and Amazon Redshift data sharing to overcome right-sizing challenges for unpredictable workloads and further improve price-performance.

Data Lake

Data Lake Data Warehouse Data-driven B2B

Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation

AWS Big Data

JANUARY 17, 2024

With Amazon EMR 6.15, we launched AWS Lake Formation based fine-grained access controls (FGAC) on Open Table Formats (OTFs), including Apache Hudi, Apache Iceberg, and Delta lake. Many large enterprise companies seek to use their transactional data lake to gain insights and improve decision-making.

Data Lake

Data Lake Snapshot Big Data Data-driven

Your guide to AWS Analytics at AWS re:Invent 2023

AWS Big Data

NOVEMBER 13, 2023

11:30 AM – 12:30 PM (PDT) Ceasars Forum ANT318 | Accelerate innovation with end-to-end serverless data architecture. 4:30 PM – 5:30 PM (PDT) Wynn ANT207 | Understand your data with business context. 1:00 PM – 2:00 PM (PDT) Venetian ANT201 | Accelerate innovation with real-time data.

Analytics

Analytics Data Lake Data Warehouse Data-driven

DIY cloud cost management: The strategic case for building your own tools

CIO Business Intelligence

APRIL 25, 2024

With questions around ROI, increasing outlay, and corporate scrutiny on IT cost savings on the rise, CIOs must know not only what contributes to their organization’s overall cloud spend but also how to optimize it. Evolving enterprise needs often outpace the product roadmaps of SaaS cost optimization solutions providers.

Management

Management Optimization Strategy Enterprise

Analyze Elastic IP usage history using Amazon Athena and AWS CloudTrail

AWS Big Data

MAY 15, 2024

You can use this solution regularly as part of your cost-optimization efforts to safely remove unused EIPs to reduce your costs. Athena is an interactive query service that simplifies data analysis in Amazon Simple Storage Service (Amazon S3) using standard SQL. Refer to AWS CloudTrail Lake pricing page for pricing details.

Snapshot

Snapshot Optimization Data Lake Reporting

The rise of the data lakehouse: A new era of data value

CIO Business Intelligence

AUGUST 18, 2022

Previously, Walgreens was attempting to perform that task with its data lake but faced two significant obstacles: cost and time. Those challenges are well-known to many organizations as they have sought to obtain analytical knowledge from their vast amounts of data. Lakehouses redeem the failures of some data lakes.

Data Lake

Data Lake Data Warehouse Unstructured Data Business Intelligence

Thermo Fisher transforms its customer experience

CIO Business Intelligence

AUGUST 12, 2022

The rapid growth left the company highly dependent on fragmented, manual processes and disparate data sources and systems. For its order-entry automation module, Northstar leans on AI and RPA to optimize data recognition and verification, and to reduce errors and accelerate order cycle times. Catalyzing change.

IT

IT Data Lake Sales Interactive

The Future of the Data Lakehouse – Open

Cloudera

JUNE 18, 2022

Cloudera customers run some of the biggest data lakes on earth. These lakes power mission critical large scale data analytics, business intelligence (BI), and machine learning use cases, including enterprise data warehouses. On data warehouses and data lakes. Iterations of the lakehouse.

Data Lake

Data Lake Data Warehouse Machine Learning Cost-Benefit

Building and Evaluating GenAI Knowledge Management Systems using Ollama, Trulens and Cloudera

Cloudera

MAY 23, 2024

In modern enterprises, the exponential growth of data means organizational knowledge is distributed across multiple formats, ranging from structured data stores such as data warehouses to multi-format data stores like data lakes. This makes gathering information for decision making a challenge.

Management

Management Metrics Data Processing Data Lake

4 ways generative AI addresses manufacturing challenges

IBM Big Data Hub

APRIL 15, 2024

The industry must continually optimize process, improve efficiency, and improve overall equipment effectiveness. Or we create a data lake, which quickly degenerates to a data swamp. Coupled with search and multi-modal interaction, gen AI makes a great assistant. Manufacturers use summarization in different ways.

Manufacturing

Manufacturing Contextual Data Knowledge Discovery Data Lake

Data Architecture and Strategy in the AI Era

Cloudera

MARCH 28, 2024

And of those organizations working on some stage of AI adoption, a few of the top benefits included increased productivity (35%), enhanced operational efficiency (33%), improved customer experience (33%), and optimized supply chain and logistics (33%). The benefits are clear, and there’s plenty of potential that comes with AI adoption.

Data Architecture

Data Architecture Strategy Data Lake Data-driven

The disruptive potential of open data lakehouse architectures and IBM watsonx.data

IBM Big Data Hub

JUNE 15, 2023

It is comprised of commodity cloud object storage, open data and open table formats, and high-performance open-source query engines. To help organizations scale AI workloads, we recently announced IBM watsonx.data , a data store built on an open data lakehouse architecture and part of the watsonx AI and data platform.

Data Warehouse

Data Warehouse Data Lake Optimization Data-driven

What is a Data Pipeline?

Jet Global

MAY 9, 2024

The key components of a data pipeline are typically: Data Sources : The origin of the data, such as a relational database , data warehouse, data lake , file, API, or other data store. This can include tasks such as data ingestion, cleansing, filtering, aggregation, or standardization.

Data Lake

Data Lake Data Warehouse Business Intelligence Machine Learning

The Future of the Data Lakehouse – Open

CIO Business Intelligence

JUNE 23, 2022

Cloudera customers run some of the biggest data lakes on earth. These lakes power mission critical large scale data analytics, business intelligence (BI), and machine learning use cases, including enterprise data warehouses. On data warehouses and data lakes. Iterations of the lakehouse.

Data Lake

Data Lake Data Warehouse Machine Learning Cost-Benefit

Unleashing the power of Presto: The Uber case study

IBM Big Data Hub

SEPTEMBER 25, 2023

With a few taps on a mobile device, riders request a ride; then, Uber’s algorithms work to match them with the nearest available driver and calculate the optimal price. Uber’s prowess as a transportation, logistics and analytics company hinges on their ability to leverage data effectively. But the simplicity ends there.

OLAP

OLAP Data Lake Data-driven Snapshot

Achieve your AI goals with an open data lakehouse approach

IBM Big Data Hub

OCTOBER 4, 2023

A data lakehouse architecture combines the performance of data warehouses with the flexibility of data lakes, to address the challenges of today’s complex data landscape and scale AI. Later this year, watsonx.data will infuse watsonx.ai

Data Lake

Data Lake Metadata Cost-Benefit Data Warehouse

Build an end-to-end serverless streaming pipeline with Apache Kafka on Amazon MSK using Python

AWS Big Data

MARCH 21, 2024

You want real-time access to this data so you can monitor performance in real time, and detect and mitigate issues quickly. You also need longer-term access to this data for machine learning (ML) models to run predictive maintenance assessments, find optimization opportunities, and forecast demand.

Data Lake

Data Lake Management Modeling Optimization

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics, Part 3: Visualization and trend analysis using Amazon QuickSight

AWS Big Data

MARCH 29, 2024

QuickSight makes it straightforward for business users to visualize data in interactive dashboards and reports. Analyzing historical patterns allows you to optimize performance, identify issues proactively, and improve planning. Mohit Saxena is a Senior Software Development Manager on the AWS Glue team.

Metrics

Metrics Visualization Dashboards Interactive

Power enterprise-grade Data Vaults with Amazon Redshift – Part 1

AWS Big Data

NOVEMBER 16, 2023

Amazon Redshift is a popular cloud data warehouse, offering a fully managed cloud-based service that seamlessly integrates with an organization’s Amazon Simple Storage Service (Amazon S3) data lake, real-time streams, machine learning (ML) workflows, transactional workflows, and much more—all while providing up to 7.9x

Enterprise

Enterprise Data Warehouse Data Lake Optimization

Build an Amazon Redshift data warehouse using an Amazon DynamoDB single-table design

AWS Big Data

JUNE 21, 2023

Built on highly curated structured data, it provides the flexibility and speed to run aggregations across an entire dataset to derive insights. To house our data, we need to define a data model. An optimal design choice is to use a dimensional model. This is achieved by partitioning the data.

Data Warehouse

Data Warehouse Data Lake OLAP Cost-Benefit

5 Best Practices for Extracting, Analyzing, and Visualizing Data

Smart Data Collective

DECEMBER 13, 2022

There are several choices to consider, each with its own set of advantages and disadvantages: Data warehouses are used to store data that has been processed for a specific function from one or more sources. Data lakes hold raw data that has not yet been altered to meet a specific purpose. Understand Your Audience.

Visualization

Visualization Key Performance Indicator Sales Data Lake

7 key Microsoft Azure analytics services (plus one extra)

CIO Business Intelligence

JUNE 29, 2022

If you can’t make sense of your business data, you’re effectively flying blind. Insights hidden in your data are essential for optimizing business operations, finetuning your customer experience, and developing new products — or new lines of business, like predictive maintenance. Azure Data Lake Analytics.

Data Lake

Data Lake Analytics Data Warehouse Machine Learning

Real-time streaming data top picks you cannot miss at AWS re:Invent 2023

AWS Big Data

NOVEMBER 8, 2023

Join us as we delve into the world of real-time streaming data at re:Invent 2023 and discover how you can use real-time streaming data to build new use cases, optimize existing projects and processes, and reimagine what’s possible. High-quality data is not just about accuracy; it’s also about timeliness.

Data-driven

Data-driven Data Lake Machine Learning Cost-Benefit

Chipotle’s recipe for digital transformation: Cloud plus AI

CIO Business Intelligence

OCTOBER 21, 2022

Chipotle IT’s secret sauce Garner credits Chipotle’s wholly owned business model for enabling him to deploy advanced technologies such as the cloud, analytics, data lake, and AI uniformly to all restaurants because they are all based on the same digital backbone. Chipotle’s digital business in 2022 was $3.5

Digital Transformation

Digital Transformation Data Lake Forecasting Technology

Breaking barriers in geospatial: Amazon Redshift, CARTO, and H3

AWS Big Data

MAY 16, 2024

However, visualizing and analyzing large-scale geospatial data presents a formidable challenge due to the sheer volume and intricacy of information. The need to balance detail and context while maintaining real-time interactivity can lead to issues of scalability and rendering complexity.

Data Warehouse

Data Warehouse Visualization Cost-Benefit Optimization

Moving Enterprise Data From Anywhere to Any System Made Easy

Cloudera

JUNE 2, 2022

Over the last decade, we have often heard about the proliferation of data creating sources (mobile applications, laptops, sensors, enterprise apps) in heterogeneous environments (cloud, on-prem, edge) resulting in the exponential growth of data being created.

Enterprise

Enterprise Data Lake Data Collection Data-driven

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics

AWS Big Data

NOVEMBER 20, 2023

Use case A typical workload for AWS Glue for Apache Spark jobs is to load data from a relational database to a data lake with SQL-based transformations. The end benefit for you is more effective and optimized AWS Glue for Apache Spark workloads. The metrics are available in all AWS Glue supported Regions. Check it out!

Metrics

Metrics Data Lake Cost-Benefit Dashboards

Run Spark SQL on Amazon Athena Spark

AWS Big Data

OCTOBER 23, 2023

For interactive applications, Athena Spark allows you to spend less time waiting and be more productive, with application startup time in under a second. Running SQL on data lakes is fast, and Athena provides an optimized, Trino- and Presto-compatible API that includes a powerful optimizer.

Data Lake

Data Lake Visualization Optimization Interactive

Data platform trinity: Competitive or complementary?

IBM Big Data Hub

JANUARY 18, 2023

A read-optimized platform that can integrate data from multiple applications emerged. In another decade, the internet and mobile started the generate data of unforeseen volume, variety and velocity. It required a different data platform solution. Value of the data projects are difficult to realize.

Data Lake

Data Lake Data Warehouse Data-driven Metadata

OCBC Bank Accelerates Its Data Strategy with Cloudera

Cloudera

DECEMBER 14, 2022

OCBC Bank optimizes customer experience & risk management with multi-phased data initiative. OCBC also won a Cloudera Data Impact Award 2022 in the Transformation category for the project. Real-time data analysis for better business and customer solutions.

Data Strategy

Data Strategy Strategy IT Contextual Data

How the Public Sector Can Maximize the Value of Dark Data

Cloudera

JANUARY 30, 2023

By 2025, it’s estimated that the amount of data created, consumed, and stored will reach 180 zettabytes , with up to 90% of that unstructured and nearly all of it unused for decision making. The purpose of this blog isn’t to emphasize the cyber risk of dark data but to spotlight its implications.

IoT

IoT Data Architecture Data Lake Machine Learning

Create an end-to-end data strategy for Customer 360 on AWS

AWS Big Data

MARCH 26, 2024

Customer 360 (C360) provides a complete and unified view of a customer’s interactions and behavior across all touchpoints and channels. This view is used to identify patterns and trends in customer behavior, which can inform data-driven decisions to improve business outcomes. Then, you transform this data into a concise format.

Data Strategy

Data Strategy Strategy Data Warehouse Prescriptive Analytics

How to modernize data lakes with a data lakehouse architecture

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

Webinars

Trending Sources

Top 5 Tools for Building an Interactive Analytics App

Webinars

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

Enable business users to analyze large datasets in your data lake with Amazon QuickSight

Speed up queries with the cost-based optimizer in Amazon Athena

How Ruparupa gained updated insights with an Amazon S3 data lake, AWS Glue, Apache Hudi, and Amazon QuickSight

Data Lakes: What Are They and Who Needs Them?

How Amazon Devices scaled and optimized real-time demand and supply forecasts using serverless analytics

Accelerate data science feature engineering on transactional data lakes using Amazon Athena with Apache Iceberg

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

Simplifying data processing at Capitec with Amazon Redshift integration for Apache Spark

How smava makes loans transparent and affordable using Amazon Redshift Serverless

Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation

Your guide to AWS Analytics at AWS re:Invent 2023

DIY cloud cost management: The strategic case for building your own tools

Analyze Elastic IP usage history using Amazon Athena and AWS CloudTrail

The rise of the data lakehouse: A new era of data value

Thermo Fisher transforms its customer experience

The Future of the Data Lakehouse – Open

Building and Evaluating GenAI Knowledge Management Systems using Ollama, Trulens and Cloudera

4 ways generative AI addresses manufacturing challenges

Data Architecture and Strategy in the AI Era

The disruptive potential of open data lakehouse architectures and IBM watsonx.data

What is a Data Pipeline?

The Future of the Data Lakehouse – Open

Unleashing the power of Presto: The Uber case study

Achieve your AI goals with an open data lakehouse approach

Build an end-to-end serverless streaming pipeline with Apache Kafka on Amazon MSK using Python

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics, Part 3: Visualization and trend analysis using Amazon QuickSight

Power enterprise-grade Data Vaults with Amazon Redshift – Part 1

Build an Amazon Redshift data warehouse using an Amazon DynamoDB single-table design

5 Best Practices for Extracting, Analyzing, and Visualizing Data

7 key Microsoft Azure analytics services (plus one extra)

Real-time streaming data top picks you cannot miss at AWS re:Invent 2023

Chipotle’s recipe for digital transformation: Cloud plus AI

Breaking barriers in geospatial: Amazon Redshift, CARTO, and H3

Moving Enterprise Data From Anywhere to Any System Made Easy

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics

Run Spark SQL on Amazon Athena Spark

Data platform trinity: Competitive or complementary?

OCBC Bank Accelerates Its Data Strategy with Cloudera

How the Public Sector Can Maximize the Value of Dark Data

Create an end-to-end data strategy for Customer 360 on AWS

Stay Connected