article thumbnail

Use Amazon Athena with Spark SQL for your open-source transactional table formats

AWS Big Data

These formats enable ACID (atomicity, consistency, isolation, durability) transactions, upserts, and deletes, and advanced features such as time travel and snapshots that were previously only available in data warehouses. For more information, refer to Amazon S3: Allows read and write access to objects in an S3 Bucket.

Snapshot 100
article thumbnail

Enable metric-based and scheduled scaling for Amazon Managed Service for Apache Flink

AWS Big Data

Amazon Managed Service for Apache Flink is a fully managed service that reduces the complexity of building and managing Apache Flink applications. Amazon Managed Service for Apache Flink manages the underlying Apache Flink components that provide durable application state, metrics, logs, and more.

Metrics 98
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Migrate an existing data lake to a transactional data lake using Apache Iceberg

AWS Big Data

Apache Iceberg enables transactions on data lakes and can simplify data storage, management, ingestion, and processing. An in-place migration can be performed in either of two ways: Using add_files : This procedure adds existing data files to an existing Iceberg table with a new snapshot that includes the files.

Data Lake 102
article thumbnail

Apache Iceberg optimization: Solving the small files problem in Amazon EMR

AWS Big Data

This Iceberg event-based table management feature lets you monitor table activities during writes to make better decisions about how to manage each table differently based on events. To use the feature, you can use the iceberg-aws-event-based-table-management source code and provide the built JAR in the engine’s class-path.

article thumbnail

Laminar Scales Enterprise Data Security Platform With New Management Features

Laminar Security

Yet, managing this diverse environment creates challenges for the security, privacy and governance teams charged with protecting data. According to Laminar research, more than 75% of organizations experienced a cloud data breach in 2023, which speaks for itself. Unfortunately, the evidence shows we’re not doing a good job!

article thumbnail

Use Apache Iceberg in a data lake to support incremental data processing

AWS Big Data

Whenever there is an update to the Iceberg table, a new snapshot of the table is created, and the metadata pointer points to the current table metadata file. At the top of the hierarchy is the metadata file, which stores information about the table’s schema, partition information, and snapshots. Carry out performance tuning.

Data Lake 116
article thumbnail

Interact with Apache Iceberg tables using Amazon Athena and cross account fine-grained permissions using AWS Lake Formation

AWS Big Data

Large organizations often have lines of businesses (LoBs) that operate with autonomy in managing their business data. If you’re using Athena for the first time, under Settings , choose Manage and enter the S3 bucket location that you created earlier ( iceberg-athena-lakeformation-blog/producer ). Choose Save.