Data Architecture, Data Lake and Definition

Data Architecture

Data Lake

Definition

Build a transactional data lake using Apache Iceberg, AWS Glue, and cross-account data shares using AWS Lake Formation and Amazon Athena

AWS Big Data

APRIL 24, 2023

Building a data lake on Amazon Simple Storage Service (Amazon S3) provides numerous benefits for an organization. However, many use cases, like performing change data capture (CDC) from an upstream relational database to an Amazon S3-based data lake, require handling data at a record level.

Data Lake

Data Lake Data Governance Cost-Benefit Machine Learning

Demystifying Modern Data Platforms

Cloudera

SEPTEMBER 15, 2022

Mark: The first element in the process is the link between the source data and the entry point into the data platform. At Ramsey International (RI), we refer to that layer in the architecture as the foundation, but others call it a staging area, raw zone, or even a source data lake. What is a data fabric?

Data Lake

Data Lake Data Architecture Data-driven Data Warehouse

Join 52,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

How to Optimize the Developer Experience for Monumental Impact

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

MORE WEBINARS

Trending Sources

Automated data governance with AWS Glue Data Quality, sensitive data detection, and AWS Lake Formation

AWS Big Data

OCTOBER 10, 2023

Data governance is the process of ensuring the integrity, availability, usability, and security of an organization’s data. Due to the volume, velocity, and variety of data being ingested in data lakes, it can get challenging to develop and maintain policies and procedures to ensure data governance at scale for your data lake.

Data Quality

Data Quality Data Governance Data Lake Testing

Webinars

How to Optimize the Developer Experience for Monumental Impact

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

MORE WEBINARS

Educating ChatGPT on Data Lakehouse

Cloudera

MARCH 17, 2023

As the use of ChatGPT becomes more prevalent, I frequently encounter customers and data users citing ChatGPT’s responses in their discussions. I love the enthusiasm surrounding ChatGPT and the eagerness to learn about modern data architectures such as data lakehouses, data meshes, and data fabrics.

Unstructured Data

Unstructured Data Data Lake Data Warehouse Machine Learning

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

AWS Big Data

SEPTEMBER 13, 2023

The Analytics specialty practice of AWS Professional Services (AWS ProServe) helps customers across the globe with modern data architecture implementations on the AWS Cloud. Of those tables, some are larger (such as in terms of record volume) than others, and some are updated more frequently than others.

Data Lake

Data Lake Data Processing Metadata Snapshot

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

AWS Big Data

AUGUST 31, 2023

Amazon Redshift enables you to directly access data stored in Amazon Simple Storage Service (Amazon S3) using SQL queries and join data across your data warehouse and data lake. With Amazon Redshift, you can query the data in your S3 data lake using a central AWS Glue metastore from your Redshift data warehouse.

Data Lake

Data Lake Data Warehouse Metadata Data Architecture

Visualize data quality scores and metrics generated by AWS Glue Data Quality

AWS Big Data

JUNE 6, 2023

On the Crawlers page, select data-quality-result-crawler and choose Run. When the crawler is complete, you can see the AWS Glue Data Catalog table definition. After you create the table definition on the AWS Glue Data Catalog, you can use Athena to query the Data Catalog table. Choose Create crawler.

Data Quality

Data Quality Metrics Visualization Dashboards

Build a multi-Region and highly resilient modern data architecture using AWS Glue and AWS Lake Formation

AWS Big Data

JANUARY 24, 2023

This solution only replicates metadata in the Data Catalog, not the actual underlying data. To have a redundant data lake using Lake Formation and AWS Glue in an additional Region, we recommend replicating the Amazon S3-based storage using S3 replication , S3 sync, aws-s3-copy-sync-using-batch or S3 Batch replication process.

Data Architecture

Data Architecture Metadata Data Lake Snapshot

Data platform trinity: Competitive or complementary?

IBM Big Data Hub

JANUARY 18, 2023

In another decade, the internet and mobile started the generate data of unforeseen volume, variety and velocity. It required a different data platform solution. Hence, Data Lake emerged, which handles unstructured and structured data with huge volume. Metadata plays a key role here in discovering the data assets.

Data Lake

Data Lake Data Warehouse Data-driven Metadata

A Day in the Life of a DataOps Engineer

DataKitchen

OCTOBER 11, 2021

First, you must understand the existing challenges of the data team, including the data architecture and end-to-end toolchain. Second, you must establish a definition of “done.” In DataOps, the definition of done includes more than just some working code. Definition of Done. When can you declare it done?

Testing

Testing Metadata Dashboards Statistics

How Cargotec uses metadata replication to enable cross-account data sharing

AWS Big Data

JUNE 7, 2023

Cargotec captures terabytes of IoT telemetry data from their machinery operated by numerous customers across the globe. This data needs to be ingested into a data lake, transformed, and made available for analytics, machine learning (ML), and visualization. mode('overwrite').save(output_path mode('overwrite').save(output_path

Metadata

Metadata Data Lake Machine Learning Big Data

An Introduction to Disaster Recovery with the Cloudera Data Platform

Cloudera

AUGUST 9, 2022

The DRRA focuses on describing how to think about reliability, resiliency, and recovery for the Cloudera Data Platform, and is a living document describing our collected learning across the platform and across customers. . Automating the healing, recovery, scaling, and rebalancing of core data services such as our Operational Database.

Data Lake

Data Lake Data Warehouse Data-driven IoT

Announcing the 2020 Data Impact Award Winners

Cloudera

NOVEMBER 18, 2020

The technological linchpin of its digital transformation has been its Enterprise Data Architecture & Governance platform. It hosts over 150 big data analytics sandboxes across the region with over 200 users utilizing the sandbox for data discovery.

Internet Publishing and Broadcasting

Internet Publishing and Broadcasting Data-driven Broadcasting Digital Transformation

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

AWS Big Data

MARCH 3, 2023

Building data lakes from continuously changing transactional data of databases and keeping data lakes up to date is a complex task and can be an operational challenge. You can then apply transformations and store data in Delta format for managing inserts, updates, and deletes.

Data Lake

Data Lake Dashboards Metrics Metadata

The Cloud Connection: How Governance Supports Security

Alation

APRIL 14, 2022

For example, data science always consumes “historical” data, and there is no guarantee that the semantics of older datasets are the same, even if their names are unchanged. Pushing data to a data lake and assuming it is ready for use is shortsighted. It’s not a simple definition.

Metadata

Metadata Data Governance Modeling Data-driven

Data Mesh Strategy: How to Plan for Data Mesh Implementation Success

Octopai

AUGUST 24, 2022

The term “mesh”’s latest appearance is in the concept of data mesh , coined by Zhamak Dehghani in her landmark 2019 article, How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh. How is data mesh a mesh? . While these different domains will often have concepts in common (e.g.

Strategy

Strategy Data-driven Sales Enterprise

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Cloudera

OCTOBER 7, 2021

Those decentralization efforts appeared under different monikers through time, e.g., data marts versus data warehousing implementations (a popular architectural debate in the era of structured data) then enterprise-wide data lakes versus smaller, typically BU-Specific, “data ponds”.

Metadata

Metadata Cost-Benefit Enterprise Interactive

Flexible and secure Data-as-a-Service delivered today

Birst BI

OCTOBER 24, 2019

What is your definition of DaaS? DaaS is a core component of modern data architecture. It provides a governed standard for accessing existing data objects and pipelines for sharing new data objects within an organization. This includes ETL processes and subsequent augmented and extended data sets.

Data Warehouse

Data Warehouse Recreation/Entertainment Data Lake Data Architecture

How to modernize data lakes with a data lakehouse architecture

IBM Big Data Hub

JULY 5, 2023

Data Lakes have been around for well over a decade now, supporting the analytic operations of some of the largest world corporations. Such data volumes are not easy to move, migrate or modernize. The challenges of a monolithic data lake architecture Data lakes are, at a high level, single repositories of data at scale.

Data Lake

Data Lake Metadata Cost-Benefit Data Warehouse

How smava makes loans transparent and affordable using Amazon Redshift Serverless

AWS Big Data

DECEMBER 21, 2023

To bring their customers the best deals and user experience, smava follows the modern data architecture principles with a data lake as a scalable, durable data store and purpose-built data stores for analytical processing and data consumption.

Data Lake

Data Lake Data Warehouse Data-driven B2B

Data Leaders Brief

Build a transactional data lake using Apache Iceberg, AWS Glue, and cross-account data shares using AWS Lake Formation and Amazon Athena

Demystifying Modern Data Platforms

Webinars

Trending Sources

Automated data governance with AWS Glue Data Quality, sensitive data detection, and AWS Lake Formation

Webinars

Educating ChatGPT on Data Lakehouse

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

Visualize data quality scores and metrics generated by AWS Glue Data Quality

Build a multi-Region and highly resilient modern data architecture using AWS Glue and AWS Lake Formation

Data platform trinity: Competitive or complementary?

A Day in the Life of a DataOps Engineer

How Cargotec uses metadata replication to enable cross-account data sharing

An Introduction to Disaster Recovery with the Cloudera Data Platform

Announcing the 2020 Data Impact Award Winners

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

The Cloud Connection: How Governance Supports Security

Data Mesh Strategy: How to Plan for Data Mesh Implementation Success

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Flexible and secure Data-as-a-Service delivered today

How to modernize data lakes with a data lakehouse architecture

How smava makes loans transparent and affordable using Amazon Redshift Serverless

Stay Connected