Data Lake and Reference - Data Leaders Brief

Data Lake

Reference

Multicloud data lake analytics with Amazon Athena

AWS Big Data

MARCH 18, 2024

Many organizations operate data lakes spanning multiple cloud data stores. In these cases, you may want an integrated query layer to seamlessly run analytical queries across these diverse cloud stores and streamline your data analytics processes. Refer to Using Amazon Athena Federated Query for further details.

Multicloud data lake analytics with Amazon Athena

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

Webinars

Trending Sources

How to Implement Data Engineering in Practice?

Webinars

Use Apache Iceberg in a data lake to support incremental data processing

Build a real-time GDPR-aligned Apache Iceberg data lake

Complexity Drives Costs: A Look Inside BYOD and Azure Data Lakes

Data Lakes on Cloud & it’s Usage in Healthcare

Create an Apache Hudi-based near-real-time transactional data lake using AWS DMS, Amazon Kinesis, AWS Glue streaming ETL, and data visualization using Amazon QuickSight

Navigating Data Entities, BYOD, and Data Lakes in Microsoft Dynamics

Data Modeling 301 for the cloud: data lake and NoSQL data modeling and design

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

Reference guide to build inventory management and forecasting solutions on AWS

Efficiently crawl your data lake and improve data access with an AWS Glue crawler using partition indexes

Introducing native support for Apache Hudi, Delta Lake, and Apache Iceberg on AWS Glue for Apache Spark, Part 1: Getting Started

Use Amazon Athena with Spark SQL for your open-source transactional table formats

Detect, mask, and redact PII data using AWS Glue before loading into Amazon OpenSearch Service

Configure cross-Region table access with the AWS Glue Catalog and AWS Lake Formation

Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation

Joining the Dots: Enhancing Data Analytics Through Intelligent Join Suggestions

Use AWS Glue ETL to perform merge, partition evolution, and schema evolution on Apache Iceberg

Apache Iceberg optimization: Solving the small files problem in Amazon EMR

Salesforce debuts Zero Copy Partner Network to ease data integration

Migrate data from Azure Blob Storage to Amazon S3 using AWS Glue

Automated data governance with AWS Glue Data Quality, sensitive data detection, and AWS Lake Formation

Exploring real-time streaming for generative AI Applications

AWS Lake Formation 2022 year in review

Data governance in the age of generative AI

Enhance data security and governance for Amazon Redshift Spectrum with VPC endpoints

Build an end-to-end serverless streaming pipeline with Apache Kafka on Amazon MSK using Python

How Knowledge Graphs Power Data Mesh and Data Fabric

Introducing AWS Glue crawler and create table support for Apache Iceberg format

Simplifying data processing at Capitec with Amazon Redshift integration for Apache Spark

Automate schema evolution at scale with Apache Hudi in AWS Glue

Set up advanced rules to validate quality of multiple datasets with AWS Glue Data Quality

Implementing a Pharma Data Mesh using DataOps

Death by Data Cleansing (and How to Avoid It in 3 Steps)

Query your Apache Hive metastore with AWS Lake Formation permissions

Access Amazon Athena in your applications using the WebSocket API

How gaming companies can use Amazon Redshift Serverless to build scalable analytical applications faster and easier

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics

Why the Data Journey Manifesto?

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

Hybrid Vs. Multi-Cloud: 5 Key Comparisons in Kafka Architectures

Data-Centric Firms Address Athena Shortcomings with Smart Indexing

Stay Connected