Data Architecture, Data Lake, Machine Learning and Metadata

Data Architecture

Data Lake

Machine Learning

Metadata

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

AWS Big Data

APRIL 3, 2024

They understand that a one-size-fits-all approach no longer works, and recognize the value in adopting scalable, flexible tools and open data formats to support interoperability in a modern data architecture to accelerate the delivery of new solutions.

Data Lake

Data Lake Snapshot Metadata Data Architecture

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

AWS Big Data

SEPTEMBER 13, 2023

The Analytics specialty practice of AWS Professional Services (AWS ProServe) helps customers across the globe with modern data architecture implementations on the AWS Cloud. Of those tables, some are larger (such as in terms of record volume) than others, and some are updated more frequently than others.

Data Lake

Data Lake Data Processing Metadata Snapshot

Join 52,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

How to Optimize the Developer Experience for Monumental Impact

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

MORE WEBINARS

Trending Sources

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. In this post, we describe Orca’s journey building a transactional data lake using Amazon Simple Storage Service (Amazon S3), Apache Iceberg, and AWS Analytics.

Data Lake

Data Lake Analytics Snapshot Optimization

Webinars

How to Optimize the Developer Experience for Monumental Impact

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

MORE WEBINARS

Choosing an open table format for your transactional data lake on AWS

AWS Big Data

JUNE 9, 2023

A modern data architecture enables companies to ingest virtually any type of data through automated pipelines into a data lake, which provides highly durable and cost-effective object storage at petabyte or exabyte scale.

Data Lake

Data Lake Metadata Optimization Statistics

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

AWS Big Data

MARCH 10, 2023

Since the deluge of big data over a decade ago, many organizations have learned to build applications to process and analyze petabytes of data. Data lakes have served as a central repository to store structured and unstructured data at any scale and in various formats.

Data Lake

Data Lake Sales Data Warehouse Snapshot

How Cargotec uses metadata replication to enable cross-account data sharing

AWS Big Data

JUNE 7, 2023

Cargotec captures terabytes of IoT telemetry data from their machinery operated by numerous customers across the globe. This data needs to be ingested into a data lake, transformed, and made available for analytics, machine learning (ML), and visualization.

Metadata

Metadata Data Lake Machine Learning Big Data

What is a Data Mesh?

DataKitchen

AUGUST 3, 2021

The data mesh design pattern breaks giant, monolithic enterprise data architectures into subsystems or domains, each managed by a dedicated team. First-generation – expensive, proprietary enterprise data warehouse and business intelligence platforms maintained by a specialized team drowning in technical debt.

Data Architecture

Data Architecture Data Lake Cost-Benefit Data Warehouse

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

AWS Big Data

MARCH 7, 2024

At the same time, they need to optimize operational costs to unlock the value of this data for timely insights and do so with a consistent performance. With this massive data growth, data proliferation across your data stores, data warehouse, and data lakes can become equally challenging.

Data Lake

Data Lake Analytics Dashboards Metrics

What is a data architect? Skills, salaries, and how to become a data framework master

CIO Business Intelligence

OCTOBER 13, 2023

Data architecture is a complex and varied field and different organizations and industries have unique needs when it comes to their data architects. Solutions data architect: These individuals design and implement data solutions for specific business needs, including data warehouses, data marts, and data lakes.

Data Architecture

Data Architecture Data Warehouse Statistics Visualization

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

AWS Big Data

NOVEMBER 29, 2023

Since then, customer demands for better scale, higher throughput, and agility in handling a wide variety of changing, but increasingly business critical analytics and machine learning use cases has exploded, and we have been keeping pace.

Data Warehouse

Data Warehouse Data Lake Analytics Machine Learning

The Future of the Data Lakehouse – Open

CIO Business Intelligence

JUNE 23, 2022

Cloudera customers run some of the biggest data lakes on earth. These lakes power mission critical large scale data analytics, business intelligence (BI), and machine learning use cases, including enterprise data warehouses. On data warehouses and data lakes.

Data Lake

Data Lake Data Warehouse Machine Learning Cost-Benefit

AWS Lake Formation 2022 year in review

AWS Big Data

JANUARY 31, 2023

We have collected some of the key talks and solutions on data governance, data mesh, and modern data architecture published and presented in AWS re:Invent 2022, and a few data lake solutions built by customers and AWS Partners for easy reference. Starting with Amazon EMR release 6.7.0,

Data Lake

Data Lake Data Governance Data Architecture Data Warehouse

The Future of the Data Lakehouse – Open

Cloudera

JUNE 18, 2022

Data Lake

Data Lake Data Warehouse Machine Learning Cost-Benefit

Educating ChatGPT on Data Lakehouse

Cloudera

MARCH 17, 2023

As the use of ChatGPT becomes more prevalent, I frequently encounter customers and data users citing ChatGPT’s responses in their discussions. I love the enthusiasm surrounding ChatGPT and the eagerness to learn about modern data architectures such as data lakehouses, data meshes, and data fabrics.

Unstructured Data

Unstructured Data Data Lake Data Warehouse Machine Learning

Usability and Connecting Threads: How Data Fabric Makes Sense Out of Disparate Data

Ontotext

AUGUST 4, 2023

A data fabric utilizes an integrated data layer over existing, discoverable, and inferenced metadata assets to support the design, deployment, and utilization of data across enterprises, including hybrid and multi-cloud platforms. It also helps capture and connect data based on business or domains.

Metadata

Metadata Data-driven Data Architecture Data Quality

Power enterprise-grade Data Vaults with Amazon Redshift – Part 1

AWS Big Data

NOVEMBER 16, 2023

Amazon Redshift is a popular cloud data warehouse, offering a fully managed cloud-based service that seamlessly integrates with an organization’s Amazon Simple Storage Service (Amazon S3) data lake, real-time streams, machine learning (ML) workflows, transactional workflows, and much more—all while providing up to 7.9x

Enterprise

Enterprise Data Warehouse Data Lake Optimization

How Amazon Finance Automation built a data mesh to support distributed data ownership and centralize governance

AWS Big Data

JULY 14, 2023

In this post, we discuss how the Amazon Finance Automation team used AWS Lake Formation and the AWS Glue Data Catalog to build a data mesh architecture that simplified data governance at scale and provided seamless data access for analytics, AI, and machine learning (ML) use cases.

Finance

Finance Metadata Big Data Recreation/Entertainment

Create an end-to-end data strategy for Customer 360 on AWS

AWS Big Data

MARCH 26, 2024

This can be achieved using AWS Entity Resolution , which enables using rules and machine learning (ML) techniques to match records and resolve identities. Then, you transform this data into a concise format. The AWS modern data architecture shows a way to build a purpose-built, secure, and scalable data platform in the cloud.

Data Strategy

Data Strategy Strategy Data Warehouse Prescriptive Analytics

Exploring real-time streaming for generative AI Applications

AWS Big Data

MARCH 25, 2024

Foundation models (FMs) are large machine learning (ML) models trained on a broad spectrum of unlabeled and generalized datasets. Streaming jobs constantly ingest new data to synchronize across systems and can perform enrichment, transformations, joins, and aggregations across windows of time more efficiently.

Data Lake

Data Lake Unstructured Data Management Modeling

Convergent Evolution

Peter James Thomas

AUGUST 18, 2018

That was the Science, here comes the Technology… A Brief Hydrology of Data Lakes. Overlapping with the above, from around 2012, I began to get involved in also designing and implementing Big Data Architectures; initially for narrow purposes and later Data Lakes spanning entire enterprises.

Data Lake

Data Lake Data Warehouse Data mining Statistics

Our Next Phase of Growth: Enterprise Data Catalogs

Alation

FEBRUARY 13, 2020

Following a very successful year of growth in Alation’s business, this announcement marks a milestone for Alation and the enterprise data catalog market. What started six years ago as one startup trying to improve the way people work with data has become a full-blown market category – Machine Learning Data Catalogs.

Enterprise

Enterprise Data Lake Machine Learning Data-driven

Breaking State and Local Data Silos with Modern Data Architectures

Cloudera

AUGUST 30, 2022

Modern data architectures. To eliminate or integrate these silos, the public sector needs to adopt robust data management solutions that support modern data architectures (MDAs). Towards Data Science ). Deploying modern data architectures. Forrester ).

Data Architecture

Data Architecture Data Lake Metadata Data Warehouse

How Novo Nordisk built distributed data governance and control at scale

AWS Big Data

APRIL 28, 2023

This will include how to configure Okta, AWS Lake Formation , and a business intelligence tool to enable SAML-based federated use of Athena for an enterprise BI activity. When building a scalable data architecture on AWS, giving autonomy and ownership to the data domains are crucial for the success of the platform.

Data Governance

Data Governance Management Data-driven Data Lake

Unstructured data management and governance using AWS AI/ML and analytics services

AWS Big Data

OCTOBER 25, 2023

By some estimates, unstructured data can make up to 80–90% of all new enterprise data and is growing many times faster than structured data. After decades of digitizing everything in your enterprise, you may have an enormous amount of data, but with dormant value. The solution integrates data in three tiers.

Unstructured Data

Unstructured Data Metadata Management Analytics

5 Reasons to Use Apache Iceberg on Cloudera Data Platform (CDP)

Cloudera

MARCH 23, 2022

In fact, we recently announced the integration with our cloud ecosystem bringing the benefits of Iceberg to enterprises as they make their journey to the public cloud, and as they adopt more converged architectures like the Lakehouse. 1: Multi-function analytics . 3: Open Performance.

Metadata

Metadata Data Architecture Machine Learning Cost-Benefit

Data platform trinity: Competitive or complementary?

IBM Big Data Hub

JANUARY 18, 2023

In another decade, the internet and mobile started the generate data of unforeseen volume, variety and velocity. It required a different data platform solution. Hence, Data Lake emerged, which handles unstructured and structured data with huge volume. Data fabric promotes data discoverability.

Data Lake

Data Lake Data Warehouse Data-driven Metadata

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

Cloudera

JUNE 30, 2022

Today’s general availability announcement covers Iceberg running within key data services in the Cloudera Data Platform (CDP) — including Cloudera Data Warehousing ( CDW ), Cloudera Data Engineering ( CDE ), and Cloudera Machine Learning ( CML ). Why integrate Apache Iceberg with Cloudera Data Platform?

Data Lake

Data Lake Data Architecture Metadata Data Warehouse

Habib Bank manages data at scale with Cloudera Data Platform

Cloudera

NOVEMBER 17, 2022

The Solution: CDP Private Cloud brings a next-generation hybrid architecture with cloud-native benefits to HBL’s data platform. HBL started their data journey in 2019 when data lake initiative was started to consolidate complex data sources and enable the bank to use single version of truth for decision making.

Management

Management Data Lake Consulting Unstructured Data

Augmented data management: Data fabric versus data mesh

IBM Big Data Hub

APRIL 27, 2022

Data fabric and data mesh are emerging data management concepts that are meant to address the organizational change and complexities of understanding, governing and working with enterprise data in a hybrid multicloud ecosystem. The good news is that both data architecture concepts are complimentary.

Management

Management Metadata Data Architecture Data Lake

Building a Beautiful Data Lakehouse

CIO Business Intelligence

MARCH 9, 2022

However, they do contain effective data management, organization, and integrity capabilities. As a result, users can easily find what they need, and organizations avoid the operational and cost burdens of storing unneeded or duplicate data copies. Warehouse, data lake convergence. Meet the data lakehouse.

Data Lake

Data Lake Unstructured Data Data Warehouse Data Quality

The Cloud Connection: How Governance Supports Security

Alation

APRIL 14, 2022

In today’s AI/ML-driven world of data analytics, explainability needs a repository just as much as those doing the explaining need access to metadata, EG, information about the data being used. The Cloud Data Migration Challenge. Pushing data to a data lake and assuming it is ready for use is shortsighted.

Metadata

Metadata Data Governance Modeling Data-driven

AWS re:Invent Recap: The Future of Cloud

Alation

DECEMBER 14, 2021

How do you provide access and connect the right people to the right data? AWS has created a way to manage policies and access, but this is only for data lake formation. What about other data sources? To that end, AWS is making inroads into the analytics and machine learning space. Other Keynote Highlights.

Data Lake

Data Lake Data Warehouse Machine Learning Cost-Benefit

AWS re:Invent 2023 Amazon Redshift Sessions Recap

AWS Big Data

DECEMBER 18, 2023

Amazon Redshift powers data-driven decisions for tens of thousands of customers every day with a fully managed, AI-powered cloud data warehouse, delivering the best price-performance for your analytics workloads. Learn more about the AWS zero-ETL future with newly launched AWS databases integrations with Amazon Redshift.

Data Warehouse

Data Warehouse Machine Learning Data-driven Data Lake

Data democratization: How data architecture can drive business decisions and AI initiatives

IBM Big Data Hub

AUGUST 4, 2023

Today, the way businesses use data is much more fluid; data literate employees use data across hundreds of apps, analyze data for better decision-making, and access data from numerous locations. It uses knowledge graphs, semantics and AI/ML technology to discover patterns in various types of metadata.

Data Architecture

Data Architecture Data Lake Machine Learning Data Governance

Join the Alation MLDC World Tour!

Alation

FEBRUARY 20, 2020

In a nod to AC/DC, a wink to Gartner’s research report, Data Catalogs Are the New Black in Data Management and Analytics , and inspiration from the inaugural Forrester Wave : Machine Learning Data Catalogs , we have temporarily set aside our Alation orange and have been rocking “black” for the Alation MLDC World Tour.

Machine Learning

Machine Learning Metadata Reporting Data-driven

Data Leaders Brief

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

Webinars

Trending Sources

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Webinars

Choosing an open table format for your transactional data lake on AWS

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

How Cargotec uses metadata replication to enable cross-account data sharing

What is a Data Mesh?

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

What is a data architect? Skills, salaries, and how to become a data framework master

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

The Future of the Data Lakehouse – Open

AWS Lake Formation 2022 year in review

The Future of the Data Lakehouse – Open

Educating ChatGPT on Data Lakehouse

Usability and Connecting Threads: How Data Fabric Makes Sense Out of Disparate Data

Power enterprise-grade Data Vaults with Amazon Redshift – Part 1

How Amazon Finance Automation built a data mesh to support distributed data ownership and centralize governance

Create an end-to-end data strategy for Customer 360 on AWS

Exploring real-time streaming for generative AI Applications

Convergent Evolution

Our Next Phase of Growth: Enterprise Data Catalogs

Breaking State and Local Data Silos with Modern Data Architectures

How Novo Nordisk built distributed data governance and control at scale

Unstructured data management and governance using AWS AI/ML and analytics services

5 Reasons to Use Apache Iceberg on Cloudera Data Platform (CDP)

Data platform trinity: Competitive or complementary?

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

Habib Bank manages data at scale with Cloudera Data Platform

Augmented data management: Data fabric versus data mesh

Building a Beautiful Data Lakehouse

The Cloud Connection: How Governance Supports Security

AWS re:Invent Recap: The Future of Cloud

AWS re:Invent 2023 Amazon Redshift Sessions Recap

Data democratization: How data architecture can drive business decisions and AI initiatives

Join the Alation MLDC World Tour!

Stay Connected