Remove learn-sql insert-multiple-rows
article thumbnail

Use Amazon Athena with Spark SQL for your open-source transactional table formats

AWS Big Data

In this post, we show you how to use Spark SQL in Amazon Athena notebooks and work with Iceberg, Hudi, and Delta Lake table formats. We demonstrate common operations such as creating databases and tables, inserting data into the tables, querying data, and looking at snapshots of the tables in Amazon S3 using Spark SQL in Athena.

article thumbnail

Accelerate your data warehouse migration to Amazon Redshift – Part 7

AWS Big Data

With Amazon Redshift, you can use standard SQL to query data across your data warehouse, operational data stores, and data lake. You could identify the changed rows, perhaps using a filter on update timestamps, or you could modify your extract, transform, and load (ETL) process to write to both the source and target databases.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Interact with Apache Iceberg tables using Amazon Athena and cross account fine-grained permissions using AWS Lake Formation

AWS Big Data

If you are interested in learning more about data mesh architecture, visit Design a data mesh architecture using AWS Lake Formation and AWS Glue. We’ll discuss column filtering to restrict certain rows, filtering to restrict column level access, schema evolution, and time travel. Choose Create new filter. Choose Create filter.

article thumbnail

Generic orchestration framework for data warehousing workloads using Amazon Redshift RSQL

AWS Big Data

With Amazon Redshift, you can query data across your data warehouse, operational data stores, and data lake using standard SQL. To learn about new options for database scripting, refer to Accelerate your data warehouse migration to Amazon Redshift – Part 4. The converted scripts run on Amazon Redshift with little to no changes.

article thumbnail

Announcing zero-ETL integrations with AWS Databases and Amazon Redshift

AWS Big Data

Through customer feedback, we learned that lot of undifferentiated time and resources go towards building and managing ETL pipelines between transactional databases and data warehouses. ETL is the process data engineers use to combine data from different sources.

article thumbnail

Optimization Strategies for Iceberg Tables

Cloudera

Rewrite position delete files Iceberg provides a rewrite position delete files procedure in Spark SQL. In Parquet, typically you want your files to be around 512 MB and row-group sizes to be around 128 MB. Unless aligned properly, a Spark task might touch multiple partitions. See [link] to learn more about Z ordering.

article thumbnail

Power enterprise-grade Data Vaults with Amazon Redshift – Part 2

AWS Big Data

Amazon Redshift is a popular cloud data warehouse, offering a fully managed cloud-based service that seamlessly integrates with an organization’s Amazon Simple Storage Service (Amazon S3) data lake, real-time streams, machine learning (ML) workflows, transactional workflows, and much more—all while providing up to 7.9x