article thumbnail

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

AWS Big Data

Iceberg tables maintain metadata to abstract large collections of files, providing data management features including time travel, rollback, data compaction, and full schema evolution, reducing management overhead. You can use this same integration to take advantage of the data sharing and collaboration capabilities in Snowflake.

article thumbnail

Orchestrate an end-to-end ETL pipeline using Amazon S3, AWS Glue, and Amazon Redshift Serverless with Amazon MWAA

AWS Big Data

Add this policy to the AWS Glue role and Amazon MWAA role: { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:GetObject", "s3:PutObject", "s3:PutObjectAcl" ], "Resource": "arn:aws:s3:::sample-inp-bucket-etl- /*" } ] } In Account B, create the IAM policy policy_for_roleB specifying Account A as a trusted entity.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

AWS Big Data

With OpenSearch Ingestion, you can filter, enrich, transform, and deliver your data for downstream analysis and visualization. You can now analyze infrequently queried data in cloud object stores and simultaneously use the operational analytics and visualization capabilities of OpenSearch Service.

Data Lake 111
article thumbnail

Amazon DataZone now integrates with AWS Glue Data Quality and external data quality solutions

AWS Big Data

We discuss how to visualize data quality scores in Amazon DataZone, enable AWS Glue Data Quality when creating a new Amazon DataZone data source, and enable data quality for an existing data asset. If the asset has AWS Glue Data Quality enabled, you can now quickly visualize the data quality score directly in the catalog search pane.

article thumbnail

Gain insights from historical location data using Amazon Location Service and AWS analytics services

AWS Big Data

The Data Catalog provides metadata that allows analytics applications using Athena to find, read, and process the location data stored in Amazon S3. Visual layouts in some screenshots in this post may look different than those on your AWS Management Console. The following code is the input paths map: { EventType: $.detail.EventType

article thumbnail

Themes and Conferences per Pacoid, Episode 8

Domino Data Lab

That’s a lot of priorities – especially when you group together closely related items such as data lineage and metadata management which rank nearby. DG emerges for the big data side of the world, e.g., the Alation launch in 2012. Allows metadata repositories to share and exchange. That would’ve been heresy in earlier years.

article thumbnail

Process and analyze highly nested and large XML files using AWS Glue and Amazon Athena

AWS Big Data

Solution overview We explore two distinct techniques that can streamline your XML file processing workflow: Technique 1: Use an AWS Glue crawler and the AWS Glue visual editor – You can use the AWS Glue user interface in conjunction with a crawler to define the table structure for your XML files. Choose Create.