Data Architecture, Data Lake and Publishing

Data Architecture

Data Lake

Publishing

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

AWS Big Data

MARCH 10, 2023

Since the deluge of big data over a decade ago, many organizations have learned to build applications to process and analyze petabytes of data. Data lakes have served as a central repository to store structured and unstructured data at any scale and in various formats.

Data Lake

Data Lake Sales Data Warehouse Snapshot

Warehouse, Lake or a Lakehouse – What’s Right for you?

Analytics Vidhya

OCTOBER 10, 2022

This article was published as a part of the Data Science Blogathon. Introduction Most of you would know the different approaches for building a data and analytics platform. You would have already worked on systems that used traditional warehouses or Hadoop-based data lakes. Selecting one among […].

Data Lake

Data Lake Data Science Publishing Analytics

Join 52,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

How to Optimize the Developer Experience for Monumental Impact

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

MORE WEBINARS

Trending Sources

AWS Lake Formation 2022 year in review

AWS Big Data

JANUARY 31, 2023

We have collected some of the key talks and solutions on data governance, data mesh, and modern data architecture published and presented in AWS re:Invent 2022, and a few data lake solutions built by customers and AWS Partners for easy reference. Starting with Amazon EMR release 6.7.0,

Data Lake

Data Lake Data Governance Data Architecture Machine Learning

Webinars

How to Optimize the Developer Experience for Monumental Impact

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

MORE WEBINARS

Automated data governance with AWS Glue Data Quality, sensitive data detection, and AWS Lake Formation

AWS Big Data

OCTOBER 10, 2023

Data governance is the process of ensuring the integrity, availability, usability, and security of an organization’s data. Due to the volume, velocity, and variety of data being ingested in data lakes, it can get challenging to develop and maintain policies and procedures to ensure data governance at scale for your data lake.

Data Quality

Data Quality Data Governance Data Lake Testing

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

AWS Big Data

SEPTEMBER 13, 2023

The Analytics specialty practice of AWS Professional Services (AWS ProServe) helps customers across the globe with modern data architecture implementations on the AWS Cloud. Of those tables, some are larger (such as in terms of record volume) than others, and some are updated more frequently than others.

Data Lake

Data Lake Data Processing Metadata Snapshot

Achieving Trusted AI in Manufacturing

Cloudera

JANUARY 30, 2024

Add appropriate contextual data (IT/business data), which is critical in AI analysis of manufacturing data. Eliminate data silos. Data from multiple sources must be centralized and stored on a common data lake so that you will have one source of truth across the value chain.

Manufacturing

Manufacturing Contextual Data IoT Digital Transformation

The hidden history of Db2

IBM Big Data Hub

JULY 5, 2022

In today’s world of complex data architectures and emerging technologies, databases can sometimes be undervalued and unrecognized. Back in the 1960s and 70s, vast amounts of data were stored in the world’s new mainframe computers—many of them IBM System/360 machines—and had become a problem. They were expensive.

Data Lake

Data Lake Data Warehouse Publishing Structured Data

What is Data Mesh?

Ontotext

NOVEMBER 16, 2023

Figure 1 Shows the overall idea of a data mesh with the major components: What Is a Data Mesh and How Does It Work? Think of data mesh as an operational mode for organizations with a domain-driven, decentralized data architecture.

Metadata

Metadata Data-driven Data Quality Data Architecture

Eight Top DataOps Trends for 2022

DataKitchen

NOVEMBER 29, 2021

From our unique vantage point in the evolution toward DataOps automation, we publish an annual prediction of trends that most deeply impact the DataOps enterprise software industry as a whole. Data Gets Meshier. 2022 will bring further momentum behind modular enterprise architectures like data mesh.

Testing

Testing Data Lake Data Architecture Manufacturing

Estimating Scope 1 Carbon Footprint with Amazon Athena

AWS Big Data

AUGUST 2, 2023

The Economic Input-Output Life Cycle Assessment (EIO LCA) method is a spend-based method that combines expenditure data with monetary-based emission factors to estimate the emissions produced. The emission factors are published by the U.S. Environment Protection Agency (EPA) and other peer-reviewed academic and government sources.

Data Lake

Data Lake Measurement Visualization Data Architecture

Data platform trinity: Competitive or complementary?

IBM Big Data Hub

JANUARY 18, 2023

In another decade, the internet and mobile started the generate data of unforeseen volume, variety and velocity. It required a different data platform solution. Hence, Data Lake emerged, which handles unstructured and structured data with huge volume. Data discoverability. Data mesh: A mostly new culture.

Data Lake

Data Lake Data Warehouse Data-driven Metadata

Augmented data management: Data fabric versus data mesh

IBM Big Data Hub

APRIL 27, 2022

Data fabric and data mesh are emerging data management concepts that are meant to address the organizational change and complexities of understanding, governing and working with enterprise data in a hybrid multicloud ecosystem. The good news is that both data architecture concepts are complimentary.

Management

Management Metadata Data Architecture Data Lake

How Cargotec uses metadata replication to enable cross-account data sharing

AWS Big Data

JUNE 7, 2023

Cargotec captures terabytes of IoT telemetry data from their machinery operated by numerous customers across the globe. This data needs to be ingested into a data lake, transformed, and made available for analytics, machine learning (ML), and visualization. The export process on the source account is a scheduled job.

Metadata

Metadata Data Lake Machine Learning Big Data

A Day in the Life of a DataOps Engineer

DataKitchen

OCTOBER 11, 2021

First, you must understand the existing challenges of the data team, including the data architecture and end-to-end toolchain. Figure 2: Example data pipeline with DataOps automation. In this project, I automated data extraction from SFTP, the public websites, and the email attachments.

Testing

Testing Metadata Dashboards Statistics

Announcing the 2020 Data Impact Award Winners

Cloudera

NOVEMBER 18, 2020

We also celebrated the first-ever winner of the Data Impact Achievement Award — a new award category that recognizes one customer who has consistently achieved transformation across their business, pursuing a diverse set of use cases and creating a culture of data-driven innovation. . Data Impact Achievement Award.

Internet Publishing and Broadcasting

Internet Publishing and Broadcasting Data-driven Broadcasting Digital Transformation

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Cloudera

OCTOBER 7, 2021

Those decentralization efforts appeared under different monikers through time, e.g., data marts versus data warehousing implementations (a popular architectural debate in the era of structured data) then enterprise-wide data lakes versus smaller, typically BU-Specific, “data ponds”.

Metadata

Metadata Cost-Benefit Enterprise Interactive

Accelerate Amazon Redshift secure data use with Satori – Part 1

AWS Big Data

SEPTEMBER 21, 2023

Integrating Satori with Amazon Redshift accelerates organizations’ ability to make use of their data to generate business value. This faster time-to-value is achieved by enabling companies to manage data access more efficiently and effectively. Lisa Levy is a Content Specialist at Satori.

Data Warehouse

Data Warehouse Interactive Data Architecture Data Lake

A Retrospective of 2018’s Articles

Peter James Thomas

APRIL 9, 2019

Overall the total number of articles and new pages I published exceeded 2017’s figures to claim the second spot behind 2009; our first year in business. This article offers a framework for building momentum in the early stages of a Data Programme. Analytics & Big Data. Draining the Swamp. Convergent Evolution.

Data-driven

Data-driven Statistics Big Data Data Science

How Amazon Finance Automation built a data mesh to support distributed data ownership and centralize governance

AWS Big Data

JULY 14, 2023

These inputs reinforced the need of a unified data strategy across the FinOps teams. We decided to build a scalable data management product that is based on the best practices of modern data architecture. Our source system and domain teams were mapped as data producers, and they would have ownership of the datasets.

Finance

Finance Metadata Big Data Recreation/Entertainment

5 Key Takeaways from Flink Forward 2023

Cloudera

NOVEMBER 27, 2023

Consider a few factors: First, many have been using Kafka as long-term storage and have seen their clusters grow without the same elasticity and accessibility one would expect from a modern data lake. For now, Flink plus Iceberg is the compute plus storage solution for streaming data.

Data Lake

Data Lake Advertising ROI Data Warehouse

Data Leaders Brief

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

Warehouse, Lake or a Lakehouse – What’s Right for you?

Webinars

Trending Sources

AWS Lake Formation 2022 year in review

Webinars

Automated data governance with AWS Glue Data Quality, sensitive data detection, and AWS Lake Formation

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

Achieving Trusted AI in Manufacturing

The hidden history of Db2

What is Data Mesh?

Eight Top DataOps Trends for 2022

Estimating Scope 1 Carbon Footprint with Amazon Athena

Data platform trinity: Competitive or complementary?

Augmented data management: Data fabric versus data mesh

How Cargotec uses metadata replication to enable cross-account data sharing

A Day in the Life of a DataOps Engineer

Announcing the 2020 Data Impact Award Winners

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Accelerate Amazon Redshift secure data use with Satori – Part 1

A Retrospective of 2018’s Articles

How Amazon Finance Automation built a data mesh to support distributed data ownership and centralize governance

5 Key Takeaways from Flink Forward 2023

Stay Connected