Big Data, Data Architecture, Data Lake and Definition

Big Data

Data Architecture

Data Lake

Definition

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JUNE 10, 2024

Apache Iceberg brings the reliability and simplicity of SQL tables to big data, while making it possible for processing engines such as Apache Spark, Trino, Apache Flink, Presto, Apache Hive, and Impala to safely work with the same tables at the same time. The following table shows the cost and time for each query and product.

Data Lake

Data Lake Metadata Snapshot Analytics

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

AWS Big Data

AUGUST 31, 2023

Amazon Redshift enables you to directly access data stored in Amazon Simple Storage Service (Amazon S3) using SQL queries and join data across your data warehouse and data lake. With Amazon Redshift, you can query the data in your S3 data lake using a central AWS Glue metastore from your Redshift data warehouse.

Data Lake

Data Lake Data Warehouse Metadata Data Architecture

Join 52,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Leading the Development of Profitable and Sustainable Products

How To Get Promoted In Product Management

MORE WEBINARS

Trending Sources

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

AWS Big Data

SEPTEMBER 13, 2023

The Analytics specialty practice of AWS Professional Services (AWS ProServe) helps customers across the globe with modern data architecture implementations on the AWS Cloud. Of those tables, some are larger (such as in terms of record volume) than others, and some are updated more frequently than others.

Data Lake

Data Lake Data Processing Metadata Snapshot

Webinars

Leading the Development of Profitable and Sustainable Products

How To Get Promoted In Product Management

MORE WEBINARS

Build a transactional data lake using Apache Iceberg, AWS Glue, and cross-account data shares using AWS Lake Formation and Amazon Athena

AWS Big Data

APRIL 24, 2023

Building a data lake on Amazon Simple Storage Service (Amazon S3) provides numerous benefits for an organization. However, many use cases, like performing change data capture (CDC) from an upstream relational database to an Amazon S3-based data lake, require handling data at a record level.

Data Lake

Data Lake Data Governance Cost-Benefit Machine Learning

Build a multi-Region and highly resilient modern data architecture using AWS Glue and AWS Lake Formation

AWS Big Data

JANUARY 24, 2023

This solution only replicates metadata in the Data Catalog, not the actual underlying data. To have a redundant data lake using Lake Formation and AWS Glue in an additional Region, we recommend replicating the Amazon S3-based storage using S3 replication , S3 sync, aws-s3-copy-sync-using-batch or S3 Batch replication process.

Data Architecture

Data Architecture Metadata Data Lake Snapshot

How smava makes loans transparent and affordable using Amazon Redshift Serverless

AWS Big Data

DECEMBER 21, 2023

To bring their customers the best deals and user experience, smava follows the modern data architecture principles with a data lake as a scalable, durable data store and purpose-built data stores for analytical processing and data consumption.

Data Lake

Data Lake Data Warehouse Data-driven B2B

Automated data governance with AWS Glue Data Quality, sensitive data detection, and AWS Lake Formation

AWS Big Data

OCTOBER 10, 2023

Data governance is the process of ensuring the integrity, availability, usability, and security of an organization’s data. Due to the volume, velocity, and variety of data being ingested in data lakes, it can get challenging to develop and maintain policies and procedures to ensure data governance at scale for your data lake.

Data Quality

Data Quality Data Governance Data Lake Testing

Visualize data quality scores and metrics generated by AWS Glue Data Quality

AWS Big Data

JUNE 6, 2023

On the Crawlers page, select data-quality-result-crawler and choose Run. When the crawler is complete, you can see the AWS Glue Data Catalog table definition. After you create the table definition on the AWS Glue Data Catalog, you can use Athena to query the Data Catalog table. Choose Create crawler.

Data Quality

Data Quality Metrics Visualization Dashboards

Announcing the 2020 Data Impact Award Winners

Cloudera

NOVEMBER 18, 2020

The technological linchpin of its digital transformation has been its Enterprise Data Architecture & Governance platform. It hosts over 150 big data analytics sandboxes across the region with over 200 users utilizing the sandbox for data discovery.

Internet Publishing and Broadcasting

Internet Publishing and Broadcasting Data-driven Broadcasting Digital Transformation

How Cargotec uses metadata replication to enable cross-account data sharing

AWS Big Data

JUNE 7, 2023

Cargotec captures terabytes of IoT telemetry data from their machinery operated by numerous customers across the globe. This data needs to be ingested into a data lake, transformed, and made available for analytics, machine learning (ML), and visualization. Arun A K is a Big Data Specialist Solutions Architect at AWS.

Metadata

Metadata Data Lake Machine Learning Big Data

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

AWS Big Data

MARCH 3, 2023

Building data lakes from continuously changing transactional data of databases and keeping data lakes up to date is a complex task and can be an operational challenge. You can then apply transformations and store data in Delta format for managing inserts, updates, and deletes.

Data Lake

Data Lake Dashboards Metrics Metadata

Data Mesh Strategy: How to Plan for Data Mesh Implementation Success

Octopai

AUGUST 24, 2022

The term “mesh”’s latest appearance is in the concept of data mesh , coined by Zhamak Dehghani in her landmark 2019 article, How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh. How is data mesh a mesh? . The key to moving ahead is to take a step back.

Strategy

Strategy Data-driven Sales Enterprise

Data platform trinity: Competitive or complementary?

IBM Big Data Hub

JANUARY 18, 2023

In another decade, the internet and mobile started the generate data of unforeseen volume, variety and velocity. It required a different data platform solution. Hence, Data Lake emerged, which handles unstructured and structured data with huge volume. Metadata plays a key role here in discovering the data assets.

Data Lake

Data Lake Data Warehouse Data-driven Metadata

The Cloud Connection: How Governance Supports Security

Alation

APRIL 14, 2022

A cloud environment with such features will support collaboration across departments and across common data types, including csv, JSON, XML, AVRO, Parquet, Hyper, TDE, and more. It’s More Important to Know What Your Data Means Than Where It Is. Pushing data to a data lake and assuming it is ready for use is shortsighted.

Metadata

Metadata Data Governance Modeling Data-driven

How to modernize data lakes with a data lakehouse architecture

IBM Big Data Hub

JULY 5, 2023

Data Lakes have been around for well over a decade now, supporting the analytic operations of some of the largest world corporations. Such data volumes are not easy to move, migrate or modernize. The challenges of a monolithic data lake architecture Data lakes are, at a high level, single repositories of data at scale.

Data Lake

Data Lake Metadata Cost-Benefit Data Warehouse

Data Leaders Brief

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

Webinars

Trending Sources

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

Webinars

Build a transactional data lake using Apache Iceberg, AWS Glue, and cross-account data shares using AWS Lake Formation and Amazon Athena

Build a multi-Region and highly resilient modern data architecture using AWS Glue and AWS Lake Formation

How smava makes loans transparent and affordable using Amazon Redshift Serverless

Automated data governance with AWS Glue Data Quality, sensitive data detection, and AWS Lake Formation

Visualize data quality scores and metrics generated by AWS Glue Data Quality

Announcing the 2020 Data Impact Award Winners

How Cargotec uses metadata replication to enable cross-account data sharing

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

Data Mesh Strategy: How to Plan for Data Mesh Implementation Success

Data platform trinity: Competitive or complementary?

The Cloud Connection: How Governance Supports Security

How to modernize data lakes with a data lakehouse architecture

Stay Connected