Data Architecture, Management and Snapshot

Data Architecture

Management

Snapshot

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

AWS Big Data

APRIL 3, 2024

They understand that a one-size-fits-all approach no longer works, and recognize the value in adopting scalable, flexible tools and open data formats to support interoperability in a modern data architecture to accelerate the delivery of new solutions. Snowflake can query across Iceberg and Snowflake table formats.

Data Lake

Data Lake Snapshot Metadata Data Architecture

Migrate an existing data lake to a transactional data lake using Apache Iceberg

AWS Big Data

OCTOBER 3, 2023

Over the years, data lakes on Amazon Simple Storage Service (Amazon S3) have become the default repository for enterprise data and are a common choice for a large set of users who query data for a variety of analytics and machine leaning use cases. Analytics use cases on data lakes are always evolving.

Data Lake

Data Lake Metadata Snapshot Recreation/Entertainment

Join 52,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

How To Get Promoted In Product Management

MORE WEBINARS

Trending Sources

Analytics Vidhya

AI at Scale isn’t Magic, it’s Data – Hybrid Data

Cloudera

OCTOBER 11, 2022

The takeaway – businesses need control over all their data in order to achieve AI at scale and digital business transformation. The challenge for AI is how to do data in all its complexity – volume, variety, velocity. And that data is likely in clouds, in data centers and at the edge.

Snapshot

Snapshot Data Science Digital Transformation Metadata

Webinars

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

How To Get Promoted In Product Management

MORE WEBINARS

Architectural patterns for real-time analytics using Amazon Kinesis Data Streams, part 1

AWS Big Data

JANUARY 8, 2024

For high availability and durability, Kinesis Data Streams achieves high durability by synchronously replicating the streamed data across three Availability Zones in an AWS Region and gives you the option to retain data for up to 365 days. Refer to Amazon Kinesis Data Streams integrations for additional details.

Analytics

Analytics IoT Data-driven Snapshot

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

AWS Big Data

MARCH 10, 2023

One important feature is to run different workloads such as business intelligence (BI), Machine Learning (ML), Data Science and data exploration, and Change Data Capture (CDC) of transactional data, without having to maintain multiple copies of data.

Data Lake

Data Lake Sales Data Warehouse Snapshot

Exploring real-time streaming for generative AI Applications

AWS Big Data

MARCH 25, 2024

In this post, we discuss why data streaming is a crucial component of generative AI applications due to its real-time nature. In-context learning LLMs are trained with point-in-time data and have no inherent ability to access fresh data at inference time. For more information, refer to Dynamic Tables.

Data Lake

Data Lake Unstructured Data Management Modeling

Cloudera Data Engineering 2021 Year End Review

Cloudera

DECEMBER 21, 2021

In working with thousands of customers deploying Spark applications, we saw significant challenges with managing Spark as well as automating, delivering, and optimizing secure data pipelines. We wanted to develop a service tailored to the data engineering practitioner built on top of a true enterprise hybrid data service platform.

Snapshot

Snapshot Data-driven Optimization Management

Amazon DataZone now integrates with AWS Glue Data Quality and external data quality solutions

AWS Big Data

APRIL 3, 2024

By analyzing the historical report snapshot, you can identify areas for improvement, implement changes, and measure the effectiveness of those changes. In our example, we have configured a ruleset against a table containing patient data within a healthcare synthetic dataset generated using Synthea.

Data Quality

Data Quality Visualization Metadata Metrics

Build a multi-Region and highly resilient modern data architecture using AWS Glue and AWS Lake Formation

AWS Big Data

JANUARY 24, 2023

Solution overview This post shows how to create a backup of the Lake Formation permissions and AWS Glue Data Catalog from one Region to another in the same account. The solution doesn’t create or modify AWS Identity and Access Management (IAM) roles, which are available in all Regions. Migrate AWS Glue databases and tables.

Data Architecture

Data Architecture Metadata Data Lake Snapshot

Synchronize your Salesforce and Snowflake data to speed up your time to insight with Amazon AppFlow

AWS Big Data

FEBRUARY 9, 2023

Customers across industries seek meaningful insights from the data captured in their Customer Relationship Management (CRM) systems. To achieve this, they combine their CRM data with a wealth of information already available in their data warehouse, enterprise systems, or other software as a service (SaaS) applications.

Data Warehouse

Data Warehouse Data-driven Snapshot Testing

Introducing Apache Iceberg in Cloudera Data Platform

Cloudera

FEBRUARY 22, 2022

Over the past decade, the successful deployment of large scale data platforms at our customers has acted as a big data flywheel driving demand to bring in even more data, apply more sophisticated analytics, and on-board many new data practitioners from business analysts to data scientists.

Snapshot

Snapshot Metadata Cost-Benefit Data Architecture

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

AWS Big Data

SEPTEMBER 13, 2023

The Analytics specialty practice of AWS Professional Services (AWS ProServe) helps customers across the globe with modern data architecture implementations on the AWS Cloud. Each component runs independently to solve a portion of the operational data processing use case.

Data Lake

Data Lake Data Processing Metadata Snapshot

5 Reasons to Use Apache Iceberg on Cloudera Data Platform (CDP)

Cloudera

MARCH 23, 2022

In fact, we recently announced the integration with our cloud ecosystem bringing the benefits of Iceberg to enterprises as they make their journey to the public cloud, and as they adopt more converged architectures like the Lakehouse. Simplify data management . 1: Multi-function analytics . Financial regulation. The *Any*-house.

Metadata

Metadata Data Architecture Machine Learning Cost-Benefit

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. Moreover, running advanced analytics and ML on disparate data sources proved challenging. To overcome these issues, Orca decided to build a data lake.

Data Lake

Data Lake Analytics Snapshot Optimization

Chose Both: Data Fabric and Data Lakehouse

Cloudera

SEPTEMBER 12, 2022

Combining and analyzing both structured and unstructured data is a whole new challenge to come to grips with, let alone doing so across different infrastructures. Both obstacles can be overcome using modern data architectures, specifically data fabric and data lakehouse. Unified data fabric. Better together.

Unstructured Data

Unstructured Data Data Architecture Data Lake Snapshot

Load data incrementally from transactional data lakes to data warehouses

AWS Big Data

OCTOBER 19, 2023

Data lakes and data warehouses are two of the most important data storage and management technologies in a modern data architecture. Data lakes store all of an organization’s data, regardless of its format or structure. An S3 bucket for storing data. The original source file 2022.csv

Data Lake

Data Lake Data Warehouse Visualization Snapshot

Estimating Scope 1 Carbon Footprint with Amazon Athena

AWS Big Data

AUGUST 2, 2023

In this blog, we will walk through how we can apply existing enterprise data to better understand and estimate Scope 1 carbon footprint using Amazon Simple Storage Service (S3) and Amazon Athena , a serverless interactive analytics service that makes it easy to analyze data using standard SQL.

Data Lake

Data Lake Measurement Visualization Data Architecture

Choosing an open table format for your transactional data lake on AWS

AWS Big Data

JUNE 9, 2023

A modern data architecture enables companies to ingest virtually any type of data through automated pipelines into a data lake, which provides highly durable and cost-effective object storage at petabyte or exabyte scale.

Data Lake

Data Lake Metadata Optimization Statistics

Migrate Microsoft Azure Synapse Analytics to Amazon Redshift using AWS SCT

AWS Big Data

OCTOBER 18, 2023

Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse that provides the flexibility to use provisioned or serverless compute for your analytical workloads. The decoupled compute and storage architecture of Amazon Redshift enables you to build highly scalable, resilient, and cost-effective workloads.

Analytics

Analytics Data Warehouse Testing Dashboards

Cloud Data Warehouse Migration 101: Expert Tips

Alation

JULY 28, 2022

What Are the Biggest Drivers of Cloud Data Warehousing? It’s costly and time-consuming to manage on-premises data warehouses — and modern cloud data architectures can deliver business agility and innovation. There are tools to replicate and snapshot data, plus tools to scale and improve performance.”

Data Warehouse

Data Warehouse Cost-Benefit Data Governance Data-driven

How Amazon optimized its high-volume financial reconciliation process with Amazon EMR for higher scalability and performance

AWS Big Data

MARCH 28, 2024

Because we had to run 1,500 processes per day with minimal latency and high performance, we choose to integrate Amazon EMR on EC2 deployment mode with managed scaling enabled to support processing variable file sizes. The transient nature enables efficient scaling, providing a robust and isolated solution for diverse data processing needs.

Optimization

Optimization IT Big Data Data Processing

Implement slowly changing dimensions in a data lake using AWS Glue and Delta

AWS Big Data

MARCH 28, 2023

This post is designed to be implemented for a real customer use case, where you get full snapshot data on a daily basis. The dataset represents employee details such as ID, name, address, phone number, contractor or not, and more. You can also maintain the delta table by compacting the small files.

Data Lake

Data Lake Testing Snapshot Sales

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

AWS Big Data

MARCH 3, 2023

You can then apply transformations and store data in Delta format for managing inserts, updates, and deletes. Amazon EMR Serverless is a serverless option in Amazon EMR that makes it easy for data analysts and engineers to run open-source big data analytics frameworks without configuring, managing, and scaling clusters or servers.

Data Lake

Data Lake Dashboards Metrics Metadata

A Summary Of Gartner’s Recent Innovation Insight Into Data Observability

DataKitchen

AUGUST 8, 2023

However, the market for Data Observability is fragmented and lacks a standard accepted definition, leading to confusion and tool adoption issues. She sees Data Observability as an emerging technology in data engineering and management. Are problems with data tests? Let’s start with a scenario. When did it last run?

Data Quality

Data Quality Testing Snapshot Reporting

“You Complete Me,” said Data Lineage to DataOps Observability.

DataKitchen

JANUARY 23, 2023

Data lineage and a data catalog are better together because they provide a more complete and accurate view of the data. Data lineage provides information about the origin, history, and movement of data, while runtime operations provide information about the actions performed on data while it is being processed.

Testing

Testing Data Governance Data Quality Data-driven

Data Leaders Brief

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

Migrate an existing data lake to a transactional data lake using Apache Iceberg

Webinars

Trending Sources

AI at Scale isn’t Magic, it’s Data – Hybrid Data

Webinars

Architectural patterns for real-time analytics using Amazon Kinesis Data Streams, part 1

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

Exploring real-time streaming for generative AI Applications

Cloudera Data Engineering 2021 Year End Review

Amazon DataZone now integrates with AWS Glue Data Quality and external data quality solutions

Build a multi-Region and highly resilient modern data architecture using AWS Glue and AWS Lake Formation

Synchronize your Salesforce and Snowflake data to speed up your time to insight with Amazon AppFlow

Introducing Apache Iceberg in Cloudera Data Platform

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

5 Reasons to Use Apache Iceberg on Cloudera Data Platform (CDP)

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Chose Both: Data Fabric and Data Lakehouse

Load data incrementally from transactional data lakes to data warehouses

Estimating Scope 1 Carbon Footprint with Amazon Athena

Choosing an open table format for your transactional data lake on AWS

Migrate Microsoft Azure Synapse Analytics to Amazon Redshift using AWS SCT

Cloud Data Warehouse Migration 101: Expert Tips

How Amazon optimized its high-volume financial reconciliation process with Amazon EMR for higher scalability and performance

Implement slowly changing dimensions in a data lake using AWS Glue and Delta

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

A Summary Of Gartner’s Recent Innovation Insight Into Data Observability

“You Complete Me,” said Data Lineage to DataOps Observability.

Stay Connected