Remove Blog Remove Optimization Remove Snapshot Remove Testing
article thumbnail

Apache Iceberg optimization: Solving the small files problem in Amazon EMR

AWS Big Data

Systems of this nature generate a huge number of small objects and need attention to compact them to a more optimal size for faster reading, such as 128 MB, 256 MB, or 512 MB. As of this writing, only the optimize-data optimization is supported. For our testing, we generated about 58,176 small objects with total size of 2 GB.

article thumbnail

In-place version upgrades for applications on Amazon Managed Service for Apache Flink now supported

AWS Big Data

To learn more about the features supported in each Apache Flink version, you can consult the Apache Flink blog , which discusses at length each of the Flink Improvement Proposals (FLIPs) incorporated into each of the versioned releases. The next recommended step is to test your application locally with the newly upgraded Apache Flink runtime.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

10 Examples of How Big Data in Logistics Can Transform The Supply Chain

datapine

You can use big data analytics in logistics, for instance, to optimize routing, improve factory processes, and create razor-sharp efficiency across the entire supply chain. Your Chance: Want to test a professional logistics analytics software? A testament to the rising role of optimization in logistics.

Big Data 275
article thumbnail

Use Apache Iceberg in a data lake to support incremental data processing

AWS Big Data

Whenever there is an update to the Iceberg table, a new snapshot of the table is created, and the metadata pointer points to the current table metadata file. At the top of the hierarchy is the metadata file, which stores information about the table’s schema, partition information, and snapshots.

Data Lake 120
article thumbnail

From Hive Tables to Iceberg Tables: Hassle-Free

Cloudera

In this blog, I will describe a few strategies one could undertake for various use cases. They also provide a “ snapshot” procedure that creates an Iceberg table with a different name with the same underlying data. You could first create a snapshot table, run sanity checks on the snapshot table, and ensure that everything is in order.

article thumbnail

Cloudera Operational Database (COD) Performance Benchmarking: Comparing HDFS and Cloud Storage

Cloudera

Test Environment: The performance comparison was done to measure the performance differences between COD using storage on Hadoop Distributed File System (HDFS) and COD using cloud storage. We tested for two cloud storages, AWS S3 and Azure ABFS. These performance measurements were done on COD 7.2.15 runtime version. CDH: 7.2.14.2

article thumbnail

Amazon Managed Service for Apache Flink now supports Apache Flink version 1.18

AWS Big Data

By default, the sink writes in batches to optimize throughput. SQL In Apache Flink SQL, users can provide hints to join queries that can be used to suggest the optimizer to have an effect in the query plan. where the operator state couldn’t be properly restored when snapshot compression is enabled. With versions 1.16