Data Lake and Interactive - Data Leaders Brief

Migrate an existing data lake to a transactional data lake using Apache Iceberg

AWS Big Data

OCTOBER 3, 2023

A data lake is a centralized repository that you can use to store all your structured and unstructured data at any scale. You can store your data as-is, without having to first structure the data and then run different types of analytics for better business insights.

Data Lake

Data Lake Metadata Snapshot Recreation/Entertainment

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

AWS Big Data

MARCH 10, 2023

Since the deluge of big data over a decade ago, many organizations have learned to build applications to process and analyze petabytes of data. Data lakes have served as a central repository to store structured and unstructured data at any scale and in various formats.

Data Lake

Data Lake Sales Data Warehouse Snapshot

Top 5 Tools for Building an Interactive Analytics App

Smart Data Collective

OCTOBER 27, 2021

An interactive analytics application gives users the ability to run complex queries across complex data landscapes in real-time: thus, the basis of its appeal. Interactive analytics applications present vast volumes of unstructured data at scale to provide instant insights. Why Use an Interactive Analytics Application?

Interactive

Interactive Unstructured Data Analytics Data Warehouse

Webinars

The Product Manager’s Guide to Optimizing DX for Systemic Impact

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

How To Get Promoted In Product Management

MORE WEBINARS

Enable business users to analyze large datasets in your data lake with Amazon QuickSight

AWS Big Data

JUNE 23, 2023

Events and many other security data types are stored in Imperva’s Threat Research Multi-Region data lake. Imperva harnesses data to improve their business outcomes. As part of their solution, they are using Amazon QuickSight to unlock insights from their data.

Data Lake

Data Lake Cost-Benefit Dashboards Data Warehouse

Building Best-in-Class Enterprise Analytics

Speaker: Anthony Roach, Director of Product Management at Tableau Software, and Jeremiah Morrow, Partner Solution Marketing Director at Dremio

As a result, these two solutions come together to deliver: Lightning-fast BI and interactive analytics directly on data wherever it is stored. A self-service platform for data exploration and visualization that broadens access to analytic insights. A seamless and efficient customer experience.

Analytics

Build a real-time GDPR-aligned Apache Iceberg data lake

AWS Big Data

FEBRUARY 24, 2023

Data lakes are a popular choice for today’s organizations to store their data around their business activities. As a best practice of a data lake design, data should be immutable once stored. A data lake built on AWS uses Amazon Simple Storage Service (Amazon S3) as its primary storage environment.

Data Lake

Data Lake Metadata Testing Data Warehouse

Create an Apache Hudi-based near-real-time transactional data lake using AWS DMS, Amazon Kinesis, AWS Glue streaming ETL, and data visualization using Amazon QuickSight

AWS Big Data

AUGUST 3, 2023

Data analytics on operational data at near-real time is becoming a common need. Due to the exponential growth of data volume, it has become common practice to replace read replicas with data lakes to have better scalability and performance. For more information, see Changing the default settings for your data lake.

Data Lake

Data Lake Visualization Dashboards Insurance

Data Lakes: What Are They and Who Needs Them?

Jet Global

JULY 2, 2019

To address the flood of data and the needs of enterprise businesses to store, sort, and analyze that data, a new storage solution has evolved: the data lake. What’s in a Data Lake? Data warehouses do a great job of standardizing data from disparate sources for analysis. Taking a Dip.

Data Lake

Data Lake Data Warehouse Big Data Machine Learning

5 things on our data and AI radar for 2021

O'Reilly on Data

FEBRUARY 19, 2021

The Right Solution for Your Data: Cloud Data Lakes and Data Lakehouses. Data lakes have experienced a fairly robust resurgence over the last few years, specifically cloud data lakes. A Wave of Cloud-Native, Distributed Data Frameworks. What will that lead to in 2021?

Data Lake

Data Lake Data Warehouse Machine Learning Modeling

Data Modeling 301 for the cloud: data lake and NoSQL data modeling and design

erwin

AUGUST 15, 2022

For NoSQL, data lakes, and data lake houses—data modeling of both structured and unstructured data is somewhat novel and thorny. This blog is an introduction to some advanced NoSQL and data lake database design techniques (while avoiding common pitfalls) is noteworthy. Data Modeling.

Data Lake

Data Lake Modeling Unstructured Data Data Warehouse

Introducing native support for Apache Hudi, Delta Lake, and Apache Iceberg on AWS Glue for Apache Spark, Part 1: Getting Started

AWS Big Data

JANUARY 26, 2023

AWS Glue provides an extensible architecture that enables users with different data processing use cases. A common use case is building data lakes on Amazon Simple Storage Service (Amazon S3) using AWS Glue extract, transform, and load (ETL) jobs. as follows: # Use Glue version 3.0

Data Lake

Data Lake Big Data Software Interactive

Joining the Dots: Enhancing Data Analytics Through Intelligent Join Suggestions

Dataiku

SEPTEMBER 1, 2023

Lately, the concept of data experience has been gaining attention in discussions around the enterprise data stack. As the name suggests, it refers to how people interact with data in enterprise settings. Due to fragmented data setups in these companies, their data lakes have the following characteristics:

Data Lake

Data Lake Data Analytics Analytics Interactive

Interact with Apache Iceberg tables using Amazon Athena and cross account fine-grained permissions using AWS Lake Formation

AWS Big Data

MARCH 23, 2023

Register the S3 path storing the table using Lake Formation We register the S3 full path in Lake Formation: Navigate to the Lake Formation console. In the navigation pane, under Register and ingest , choose Data lake locations. Jack Ye is a software engineer of the Athena Data Lake and Storage team at AWS.

Interactive

Interactive Snapshot Data Lake Software

IDG Contributor Network: How to overcome the bottlenecks between data lakes and analytics for customer engagement

CIO Business Intelligence

MAY 24, 2018

Consumers have come to expect personalized service and a satisfying experience, and anything less from the brands they interact with might cause them to take their business elsewhere. To read this article in full, please click here

Data Lake

Data Lake Analytics Big Data Interactive

Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation

AWS Big Data

JANUARY 17, 2024

With Amazon EMR 6.15, we launched AWS Lake Formation based fine-grained access controls (FGAC) on Open Table Formats (OTFs), including Apache Hudi, Apache Iceberg, and Delta lake. Many large enterprise companies seek to use their transactional data lake to gain insights and improve decision-making.

Data Lake

Data Lake Snapshot Big Data Data-driven

Introduction of Microsoft Fabric

Analytics Vidhya

OCTOBER 6, 2023

In today’s rapidly evolving digital landscape, seamless data, applications, and device integration are more pressing than ever. Enter Microsoft Fabric, a cutting-edge solution designed to revolutionize how we interact with technology.

Interactive

Interactive Technology Analytics Data Lake

The Future of the Data Lakehouse – Open

CIO Business Intelligence

JUNE 23, 2022

Cloudera customers run some of the biggest data lakes on earth. These lakes power mission critical large scale data analytics, business intelligence (BI), and machine learning use cases, including enterprise data warehouses. On data warehouses and data lakes. Iterations of the lakehouse.

Data Lake

Data Lake Data Warehouse Machine Learning Cost-Benefit

The Future of the Data Lakehouse – Open

Cloudera

JUNE 18, 2022

Cloudera customers run some of the biggest data lakes on earth. These lakes power mission critical large scale data analytics, business intelligence (BI), and machine learning use cases, including enterprise data warehouses. On data warehouses and data lakes. Iterations of the lakehouse.

Data Lake

Data Lake Data Warehouse Machine Learning Cost-Benefit

4 ways generative AI addresses manufacturing challenges

IBM Big Data Hub

APRIL 15, 2024

Or we create a data lake, which quickly degenerates to a data swamp. Coupled with search and multi-modal interaction, gen AI makes a great assistant. IBM built a workforce advisor that uses summarization and contextual data understanding with intent detection and multi-modal interaction.

Manufacturing

Manufacturing Contextual Data Knowledge Discovery Data Lake

Data governance in the age of generative AI

AWS Big Data

FEBRUARY 29, 2024

Unstructured data is typically stored across siloed systems in varying formats, and generally not managed or governed with the same level of rigor as structured data. On the backend, the batch data engineering processes refreshing the enterprise data lake need to expand to ingest, transform, and manage unstructured data.

Data Governance

Data Governance Unstructured Data Metadata Data Lake

Navigating the Chaos of Unruly Data: Solutions for Data Teams

DataKitchen

NOVEMBER 10, 2023

The Perilous State of Today’s Data Environments Data teams often navigate a labyrinth of chaos within their databases. Extrinsic Control Deficit: Many of these changes stem from tools and processes beyond the immediate control of the data team.

Data Quality

Data Quality Testing Data Lake Data Integration

Introducing AWS Glue crawler and create table support for Apache Iceberg format

AWS Big Data

AUGUST 16, 2023

Iceberg has become very popular for its support for ACID transactions in data lakes and features like schema and partition evolution, time travel, and rollback. They store their product data in Iceberg format on Amazon S3 and host the metadata of their datasets in Hive Metastore on the EMR primary node. Choose Create.

Data Lake

Data Lake Metadata Snapshot Management

What is a Data Mesh?

DataKitchen

AUGUST 3, 2021

First-generation – expensive, proprietary enterprise data warehouse and business intelligence platforms maintained by a specialized team drowning in technical debt. Second-generation – gigantic, complex data lake maintained by a specialized team drowning in technical debt. Easy to report problems and receive updates on fixes.

Data Architecture

Data Architecture Data Lake Cost-Benefit Data Warehouse

Simplifying data processing at Capitec with Amazon Redshift integration for Apache Spark

AWS Big Data

NOVEMBER 10, 2023

The data sourcing problem To ensure the reliability of PySpark data pipelines, it’s essential to have consistent record-level data from both dimensional and fact tables stored in the Enterprise Data Warehouse (EDW). These tables are then joined with tables from the Enterprise Data Lake (EDL) at runtime.

Data Processing

Data Processing Data Lake Data Warehouse Optimization

AWS Lake Formation 2022 year in review

AWS Big Data

JANUARY 31, 2023

We have collected some of the key talks and solutions on data governance, data mesh, and modern data architecture published and presented in AWS re:Invent 2022, and a few data lake solutions built by customers and AWS Partners for easy reference. Starting with Amazon EMR release 6.7.0,

Data Lake

Data Lake Data Governance Data Architecture Machine Learning

Access Amazon Athena in your applications using the WebSocket API

AWS Big Data

MARCH 2, 2023

Many organizations are building data lakes to store and analyze large volumes of structured, semi-structured, and unstructured data. In addition, many teams are moving towards a data mesh architecture, which requires them to expose their data sets as easily consumable data products.

Data Lake

Data Lake Testing Interactive Unstructured Data

Exploring the hyper-competitive future of customer experience

IBM Big Data Hub

JANUARY 19, 2024

The future of customer experience (CX) is more : more data, more technology, more surprising and delighting. It’s also more pressure to retain those customers, whether those interactions happen online or in-store. As such, future CX strategies will be more data-driven than ever before.

Data-driven

Data-driven Consulting Interactive Data Lake

Moving Enterprise Data From Anywhere to Any System Made Easy

Cloudera

JUNE 2, 2022

Over the last decade, we have often heard about the proliferation of data creating sources (mobile applications, laptops, sensors, enterprise apps) in heterogeneous environments (cloud, on-prem, edge) resulting in the exponential growth of data being created.

Enterprise

Enterprise Data Lake Data Collection Data-driven

Build an end-to-end serverless streaming pipeline with Apache Kafka on Amazon MSK using Python

AWS Big Data

MARCH 21, 2024

For your application’s low-latency and real-time data access, you can use Lambda and DynamoDB. For longer-term data storage, you can use managed serverless connector service Amazon Data Firehose to send data to your data lake. You can use the same data to train ML models.

Data Lake

Data Lake Management Modeling Optimization

2020 Data Impact Award Winner Spotlight: Merck KGaA

Cloudera

DECEMBER 11, 2020

Without meeting GxP compliance, the Merck KGaA team could not run the enterprise data lake needed to store, curate, or process the data required to inform business decisions. It established a data governance framework within its enterprise data lake. Driving innovation with secure and governed data .

Data Lake

Data Lake Cost-Benefit Unstructured Data Data Governance

Chipotle’s recipe for digital transformation: Cloud plus AI

CIO Business Intelligence

OCTOBER 21, 2022

Chipotle IT’s secret sauce Garner credits Chipotle’s wholly owned business model for enabling him to deploy advanced technologies such as the cloud, analytics, data lake, and AI uniformly to all restaurants because they are all based on the same digital backbone. Chipotle’s digital business in 2022 was $3.5

Digital Transformation

Digital Transformation Data Lake Forecasting Technology

Analyze Elastic IP usage history using Amazon Athena and AWS CloudTrail

AWS Big Data

MAY 15, 2024

Athena is an interactive query service that simplifies data analysis in Amazon Simple Storage Service (Amazon S3) using standard SQL. Additionally, you can analyze activity logs with AWS CloudTrail Lake and Amazon Athena. AWS CloudTrail Lake supports the collection of events from multiple AWS regions and AWS accounts.

Snapshot

Snapshot Optimization Data Lake Reporting

Using AWS AppSync and AWS Lake Formation to access a secure data lake through a GraphQL API

AWS Big Data

OCTOBER 9, 2023

Data lakes have been gaining popularity for storing vast amounts of data from diverse sources in a scalable and cost-effective way. As the number of data consumers grows, data lake administrators often need to implement fine-grained access controls for different user profiles.

Data Lake

Data Lake Testing Big Data Management

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics, Part 3: Visualization and trend analysis using Amazon QuickSight

AWS Big Data

MARCH 29, 2024

QuickSight makes it straightforward for business users to visualize data in interactive dashboards and reports. venvScriptsactivate.bat After this step, the subsequent steps run within the bounds of the virtual environment on the client machine and interact with the AWS account as needed.

Metrics

Metrics Visualization Dashboards Interactive

Exploring real-time streaming for generative AI Applications

AWS Big Data

MARCH 25, 2024

AWS Glue can interact with streaming data services such as Kinesis Data Streams and Amazon MSK for processing and transforming CDC data. With Amazon Managed Service for Apache Flink Studio , you can build and run Apache Flink stream processing applications using standard SQL, Python, and Scala in an interactive notebook.

Data Lake

Data Lake Unstructured Data Management Modeling

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

AWS Big Data

SEPTEMBER 13, 2023

A modern data architecture is an evolutionary architecture pattern designed to integrate a data lake, data warehouse, and purpose-built stores with a unified governance model. The company wanted the ability to continue processing operational data in the secondary Region in the rare event of primary Region failure.

Data Lake

Data Lake Data Processing Metadata Snapshot

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics

AWS Big Data

NOVEMBER 20, 2023

Use case A typical workload for AWS Glue for Apache Spark jobs is to load data from a relational database to a data lake with SQL-based transformations. The following is a visual representation of an example job where the number of workers is 10. When the example job ran, the workerUtilization metrics showed the following trend.

Metrics

Metrics Data Lake Cost-Benefit Dashboards

How gaming companies can use Amazon Redshift Serverless to build scalable analytical applications faster and easier

AWS Big Data

MARCH 7, 2023

A data hub contains data at multiple levels of granularity and is often not integrated. It differs from a data lake by offering data that is pre-validated and standardized, allowing for simpler consumption by users. Data hubs and data lakes can coexist in an organization, complementing each other.

Analytics

Analytics Data Warehouse Data Lake Metadata

Load data incrementally from transactional data lakes to data warehouses

AWS Big Data

OCTOBER 19, 2023

Data lakes and data warehouses are two of the most important data storage and management technologies in a modern data architecture. Data lakes store all of an organization’s data, regardless of its format or structure.

Data Lake

Data Lake Data Warehouse Visualization Snapshot

Implement alerts in Amazon OpenSearch Service with PagerDuty

AWS Big Data

JUNE 8, 2023

This data is often stored and analyzed using various tools, such as Amazon OpenSearch Service , a powerful search and analytics service offered by AWS. OpenSearch Service provides real-time insights into your data to support use cases like interactive log analytics, real-time application monitoring, website search, and more.

Data Lake

Data Lake Dashboards Metrics Testing

Why the Data Journey Manifesto?

DataKitchen

JUNE 12, 2023

We had been talking about “Agile Analytic Operations,” “DevOps for Data Teams,” and “Lean Manufacturing For Data,” but the concept was hard to get across and communicate. I spent much time de-categorizing DataOps: we are not discussing ETL, Data Lake, or Data Science.

Testing

Testing Data Lake Dashboards Data Science

Our Top 20 Most-Read Data and Analytics Research Last Week (to Mar 1)

Andrew White

MARCH 2, 2020

Click here for an interactive PDF to connect to the most read data and analytics research directly. This list excludes our branded research such as Magic Quadrants etc.

Data Lake

Data Lake Data Warehouse Analytics Interactive

Moving Enterprise Data From Anywhere to Any System Made Easy

CIO Business Intelligence

JULY 13, 2022

Over the last decade, we have often heard about the proliferation of data creating sources (mobile applications, laptops, sensors, enterprise apps) in heterogeneous environments (cloud, on-prem, edge) resulting in the exponential growth of data being created. Data Center

Enterprise

Enterprise Data Lake Data Collection Data-driven

Azure Data Sources for Data Science and Machine Learning

Jen Stirrup

MAY 5, 2020

Azure allows you to protect your enterprise data assets, using Azure Active Directory and setting up your virtual network. Other technologies, such as Azure Data Factory, can help process large amounts of data around in the cloud. That includes very hot data sources such a real-time processing. Azure Data Lake Store.

Machine Learning

Machine Learning Data Science Data Lake Big Data

Migrate an existing data lake to a transactional data lake using Apache Iceberg

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

Webinars

Trending Sources

Top 5 Tools for Building an Interactive Analytics App

Webinars

Enable business users to analyze large datasets in your data lake with Amazon QuickSight

Building Best-in-Class Enterprise Analytics

Build a real-time GDPR-aligned Apache Iceberg data lake

Create an Apache Hudi-based near-real-time transactional data lake using AWS DMS, Amazon Kinesis, AWS Glue streaming ETL, and data visualization using Amazon QuickSight

Data Lakes: What Are They and Who Needs Them?

5 things on our data and AI radar for 2021

Data Modeling 301 for the cloud: data lake and NoSQL data modeling and design

Introducing native support for Apache Hudi, Delta Lake, and Apache Iceberg on AWS Glue for Apache Spark, Part 1: Getting Started

Joining the Dots: Enhancing Data Analytics Through Intelligent Join Suggestions

Interact with Apache Iceberg tables using Amazon Athena and cross account fine-grained permissions using AWS Lake Formation

IDG Contributor Network: How to overcome the bottlenecks between data lakes and analytics for customer engagement

Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation

Introduction of Microsoft Fabric

The Future of the Data Lakehouse – Open

The Future of the Data Lakehouse – Open

4 ways generative AI addresses manufacturing challenges

Data governance in the age of generative AI

Navigating the Chaos of Unruly Data: Solutions for Data Teams

Introducing AWS Glue crawler and create table support for Apache Iceberg format

What is a Data Mesh?

Simplifying data processing at Capitec with Amazon Redshift integration for Apache Spark

AWS Lake Formation 2022 year in review

Access Amazon Athena in your applications using the WebSocket API

Exploring the hyper-competitive future of customer experience

Moving Enterprise Data From Anywhere to Any System Made Easy

Build an end-to-end serverless streaming pipeline with Apache Kafka on Amazon MSK using Python

2020 Data Impact Award Winner Spotlight: Merck KGaA

Chipotle’s recipe for digital transformation: Cloud plus AI

Analyze Elastic IP usage history using Amazon Athena and AWS CloudTrail

Using AWS AppSync and AWS Lake Formation to access a secure data lake through a GraphQL API

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics, Part 3: Visualization and trend analysis using Amazon QuickSight

Exploring real-time streaming for generative AI Applications

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics

How gaming companies can use Amazon Redshift Serverless to build scalable analytical applications faster and easier

Load data incrementally from transactional data lakes to data warehouses

Implement alerts in Amazon OpenSearch Service with PagerDuty

Why the Data Journey Manifesto?

Our Top 20 Most-Read Data and Analytics Research Last Week (to Mar 1)

Moving Enterprise Data From Anywhere to Any System Made Easy

Azure Data Sources for Data Science and Machine Learning

Stay Connected