Measure performance of AWS Glue Data Quality for ETL pipelines
AWS Big Data
MARCH 12, 2024
In this post, we provide benchmark results of running increasingly complex data quality rulesets over a predefined test dataset. Dataset details The test dataset contains 104 columns and 1 million rows stored in Parquet format. You can download the dataset or recreate it locally using the Python script provided in the repository.
Let's personalize your content