article thumbnail

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

AWS Big Data

They understand that a one-size-fits-all approach no longer works, and recognize the value in adopting scalable, flexible tools and open data formats to support interoperability in a modern data architecture to accelerate the delivery of new solutions. Snowflake can query across Iceberg and Snowflake table formats.

article thumbnail

Architectural patterns for real-time analytics using Amazon Kinesis Data Streams, part 1

AWS Big Data

This is the first post to a blog series that offers common architectural patterns in building real-time data streaming infrastructures using Kinesis Data Streams for a wide range of use cases. All these architecture patterns are integrated with Amazon Kinesis Data Streams.

Analytics 112
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Migrate an existing data lake to a transactional data lake using Apache Iceberg

AWS Big Data

You can store your data as-is, without having to first structure the data and then run different types of analytics for better business insights. Analytics use cases on data lakes are always evolving. In this post, we show you how you can use the Iceberg add_files procedure for an in-place data upgrade.

Data Lake 102
article thumbnail

AI at Scale isn’t Magic, it’s Data – Hybrid Data

Cloudera

Al needs machine learning (ML), ML needs data science. Data science needs analytics. And they all need lots of data. The takeaway – businesses need control over all their data in order to achieve AI at scale and digital business transformation. But it isn’t just aggregating data for models.

article thumbnail

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

AWS Big Data

Tracking data changes and rollback Build your transactional data lake on AWS You can build your modern data architecture with a scalable data lake that integrates seamlessly with an Amazon Redshift powered cloud warehouse. Data can be organized into three different zones, as shown in the following figure.

Data Lake 102
article thumbnail

Cloudera Data Engineering 2021 Year End Review

Cloudera

Today it’s used by many innovative technology companies at petabyte scale, allowing them to easily evolve schemas, create snapshots for time travel style queries, and perform row level updates and deletes for ACID compliance. Modernizing pipelines. CDP Airflow Operators.

Snapshot 118
article thumbnail

Exploring real-time streaming for generative AI Applications

AWS Big Data

Furthermore, data events are filtered, enriched, and transformed to a consumable format using a stream processor. The result is made available to the application by querying the latest snapshot. OpenSearch Service provides support for native ingestion from Kinesis data streams or MSK topics.