article thumbnail

Improve observability across Amazon MWAA tasks

AWS Big Data

For data pipeline orchestration, the Apache Airflow UI is a user-friendly tool that provides detailed views into your data pipeline. When it comes to pipeline health management, each service that your tasks are interacting with could be storing or publishing logs to different locations, such as an S3 bucket or Amazon CloudWatch logs.

article thumbnail

Amazon EMR on EKS widens the performance gap: Run Apache Spark workloads 5.37 times faster and at 4.3 times lower cost

AWS Big Data

The solution uses the TPC-DS dataset and unmodified data schema and table relationships, but derives queries from TPC-DS to support the SparkSQL test cases. It is not comparable to other published TPC-DS benchmark results. About the Authors Melody Yang is a Senior Big Data Solution Architect for Amazon EMR at AWS.

Testing 77
article thumbnail

Use Snowflake with Amazon MWAA to orchestrate data pipelines

AWS Big Data

Data is decompressed and stored in a different S3 bucket (transformed data can be stored in the same S3 bucket where data was ingested, but for simplicity, we’re using two separate S3 buckets). The transformed data is then made accessible to Snowflake for data analysis. Set the protocol to Email.