Big Data, Data Lake and Optimization

Multicloud data lake analytics with Amazon Athena

AWS Big Data

MARCH 18, 2024

Many organizations operate data lakes spanning multiple cloud data stores. In these cases, you may want an integrated query layer to seamlessly run analytical queries across these diverse cloud stores and streamline your data analytics processes. The AWS Glue Data Catalog holds the metadata for Amazon S3 and GCS data.

Multicloud data lake analytics with Amazon Athena

Differentiating Between Data Lakes and Data Warehouses

Webinars

Trending Sources

Apache Iceberg optimization: Solving the small files problem in Amazon EMR

Webinars

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

Use Apache Iceberg in a data lake to support incremental data processing

Optimize data layout by bucketing with Amazon Athena and AWS Glue to accelerate downstream queries

Enable business users to analyze large datasets in your data lake with Amazon QuickSight

Speed up queries with the cost-based optimizer in Amazon Athena

Deploy and Optimize Your Snowflake Environment Faster With Accelerators

Data Lakes: What Are They and Who Needs Them?

Efficiently crawl your data lake and improve data access with an AWS Glue crawler using partition indexes

Build a transactional data lake using Apache Iceberg, AWS Glue, and cross-account data shares using AWS Lake Formation and Amazon Athena

Why optimize your warehouse with a data lakehouse strategy

Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

Migrate data from Azure Blob Storage to Amazon S3 using AWS Glue

Use Amazon Athena with Spark SQL for your open-source transactional table formats

Data-Centric Firms Address Athena Shortcomings with Smart Indexing

Announcing the AWS Well-Architected Data Analytics Lens

Simplifying data processing at Capitec with Amazon Redshift integration for Apache Spark

Detect, mask, and redact PII data using AWS Glue before loading into Amazon OpenSearch Service

The Future of the Data Lakehouse – Open

The Future of the Data Lakehouse – Open

What is a data architect? Skills, salaries, and how to become a data framework master

Enhance data security and governance for Amazon Redshift Spectrum with VPC endpoints

Azure Data Sources for Data Science and Machine Learning

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics, Part 3: Visualization and trend analysis using Amazon QuickSight

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics

Enhance query performance using AWS Glue Data Catalog column-level statistics

Introducing the AWS ProServe Hadoop Migration Delivery Kit TCO tool

Deep dive into the AWS ProServe Hadoop Migration Delivery Kit TCO tool

Build an end-to-end serverless streaming pipeline with Apache Kafka on Amazon MSK using Python

Real estate CIOs drive deals with data

AI and ML: No Longer the Stuff of Science Fiction

How Salesforce optimized their detection and response platform using AWS managed services

4 ways generative AI addresses manufacturing challenges

Advancing AI: The emergence of a modern information lifecycle

How Data Analytics Tools Eliminate Business Owner Headaches

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

Data architecture strategy for data quality

10 Things AWS Can Do for Your SaaS Company

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

How SumUp made digital analytics more accessible using AWS Glue

Exploring real-time streaming for generative AI Applications

Stay Connected