Data Lake, Machine Learning, Metadata and Technology

Understanding the Differences Between Data Lakes and Data Warehouses

Smart Data Collective

AUGUST 28, 2021

Data lakes and data warehouses are probably the two most widely used structures for storing data. Data Warehouses and Data Lakes in a Nutshell. A data warehouse is used as a central storage space for large amounts of structured data coming from various sources. Data Type and Processing.

Data Lake

Data Lake Data Warehouse Unstructured Data Structured Data

Choosing an open table format for your transactional data lake on AWS

AWS Big Data

JUNE 9, 2023

A modern data architecture enables companies to ingest virtually any type of data through automated pipelines into a data lake, which provides highly durable and cost-effective object storage at petabyte or exabyte scale.

Data Lake

Data Lake Metadata Optimization Statistics

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. In this post, we describe Orca’s journey building a transactional data lake using Amazon Simple Storage Service (Amazon S3), Apache Iceberg, and AWS Analytics.

Data Lake

Data Lake Analytics Snapshot Optimization

Webinars

How to Optimize the Developer Experience for Monumental Impact

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

MORE WEBINARS

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

AWS Big Data

MARCH 10, 2023

Since the deluge of big data over a decade ago, many organizations have learned to build applications to process and analyze petabytes of data. Data lakes have served as a central repository to store structured and unstructured data at any scale and in various formats.

Data Lake

Data Lake Sales Data Warehouse Snapshot

Introducing MongoDB Atlas metadata collection with AWS Glue crawlers

AWS Big Data

FEBRUARY 6, 2023

For data lake customers who need to discover petabytes of data, AWS Glue crawlers are a popular way to discover and catalog data in the background. This allows users to search and find relevant data from multiple data sources. Choose the table to view the schema and other metadata.

Metadata

Metadata Data Lake Machine Learning Management

How Ruparupa gained updated insights with an Amazon S3 data lake, AWS Glue, Apache Hudi, and Amazon QuickSight

AWS Big Data

FEBRUARY 22, 2023

In this post, we show how Ruparupa implemented an incrementally updated data lake to get insights into their business using Amazon Simple Storage Service (Amazon S3), AWS Glue , Apache Hudi , and Amazon QuickSight. An AWS Glue ETL job, using the Apache Hudi connector, updates the S3 data lake hourly with incremental data.

Data Lake

Data Lake Dashboards Cost-Benefit Metadata

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

AWS Big Data

DECEMBER 13, 2023

For the past 5 years, BMS has used a custom framework called Enterprise Data Lake Services (EDLS) to create ETL jobs for business users. EDLS job steps and metadata Every EDLS job comprises one or more job steps chained together and run in a predefined order orchestrated by the custom ETL framework.

Metadata

Metadata Data Lake Visualization Data Transformation

Introducing Apache Hudi support with AWS Glue crawlers

AWS Big Data

NOVEMBER 22, 2023

Apache Hudi is an open table format that brings database and data warehouse capabilities to data lakes. Apache Hudi helps data engineers manage complex challenges, such as managing continuously evolving datasets with transactions while maintaining query performance.

Data Lake

Data Lake Snapshot Metadata Optimization

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

AWS Big Data

NOVEMBER 29, 2023

This cloud service was a significant leap from the traditional data warehousing solutions, which were expensive, not elastic, and required significant expertise to tune and operate. Use one click to access your data lake tables using auto-mounted AWS Glue data catalogs on Amazon Redshift for a simplified experience.

Data Warehouse

Data Warehouse Data Lake Analytics Machine Learning

Case study: Policy Enforcement Automation With Semantics

Ontotext

MAY 2, 2024

They are expected to understand the entire data landscape and generate business-moving insights while facing the voracious needs of different teams and the constraints of technology architecture and compliance. Evolution of data approaches The data strategies we’ve had so far have led to a lot of challenges and pain points.

Metadata

Metadata Data Lake Data-driven Enterprise

Educating ChatGPT on Data Lakehouse

Cloudera

MARCH 17, 2023

I love the enthusiasm surrounding ChatGPT and the eagerness to learn about modern data architectures such as data lakehouses, data meshes, and data fabrics. ChatGPT is an excellent resource for gaining high-level insights and building awareness of any technology.

Unstructured Data

Unstructured Data Data Lake Data Warehouse Machine Learning

What is a data architect? Skills, salaries, and how to become a data framework master

CIO Business Intelligence

OCTOBER 13, 2023

Data architect role Data architects are senior visionaries who translate business requirements into technology requirements and define data standards and principles, often in support of data or digital transformations.

Data Architecture

Data Architecture Data Warehouse Statistics Visualization

Breaking State and Local Data Silos with Modern Data Architectures

Cloudera

AUGUST 30, 2022

Data Lakehouse: Data lakehouses integrate and unify the capabilities of data warehouses and data lakes, aiming to support artificial intelligence, business intelligence, machine learning, and data engineering use cases on a single platform. Towards Data Science ). Forrester ).

Data Architecture

Data Architecture Data Lake Metadata Data Warehouse

Habib Bank manages data at scale with Cloudera Data Platform

Cloudera

NOVEMBER 17, 2022

HBL has re-envisioned itself as a ‘Technology company with a banking license’, as it transforms into the bank of tomorrow – one which empowers its customers through digital enablement. The Solution: CDP Private Cloud brings a next-generation hybrid architecture with cloud-native benefits to HBL’s data platform.

Management

Management Data Lake Consulting Unstructured Data

AWS Lake Formation 2022 year in review

AWS Big Data

JANUARY 31, 2023

We have collected some of the key talks and solutions on data governance, data mesh, and modern data architecture published and presented in AWS re:Invent 2022, and a few data lake solutions built by customers and AWS Partners for easy reference. Starting with Amazon EMR release 6.7.0,

Data Lake

Data Lake Data Governance Data Architecture Data Warehouse

How Amazon Devices scaled and optimized real-time demand and supply forecasts using serverless analytics

AWS Big Data

FEBRUARY 1, 2023

With data volumes exhibiting a double-digit percentage growth rate year on year and the COVID pandemic disrupting global logistics in 2021, it became more critical to scale and generate near-real-time data. You can visually create, run, and monitor extract, transform, and load (ETL) pipelines to load data into your data lakes.

Optimization

Optimization Forecasting Data Lake Metadata

Implement a Multi-Cloud Open Lakehouse with Apache Iceberg in Cloudera Data Platform

Cloudera

DECEMBER 15, 2022

You can leverage Kubernetes (K8s) and containerization technologies to consistently deploy your applications across multiple clouds including AWS, Azure, and Google Cloud, with portability to write once, run anywhere, and move from cloud to cloud with ease. Only metadata will be regenerated. Data quality using table rollback.

Metadata

Metadata Data Warehouse Snapshot Data Quality

Usability and Connecting Threads: How Data Fabric Makes Sense Out of Disparate Data

Ontotext

AUGUST 4, 2023

Working across data islands leads to siloed thinking and the inability to implement critical business initiatives such as Customer, Product, or Asset 360. As data is generated, stored, and used across data centers, edge, and cloud providers, managing a distributed storage environment is complex with no map to guide technology professionals.

Metadata

Metadata Data-driven Data Architecture Data Quality

Four Topics That Should Be Top of Mind for SAP Partners

Timo Elliott

JUNE 19, 2023

Technologies like SAP BTP allow us to do that more easily than ever before in the cloud environments, even for customers that have on-premise applications. Well, this is where technologies like Signavio are a great fit. And these generative AI technologies allow new possibilities for creativity. It takes high quality data.

Data Lake

Data Lake Digital Transformation Recreation/Entertainment Technology

Data Mesh 101: What it is and Why You Should Care

Ontotext

FEBRUARY 12, 2024

Although the enterprise data landscape is littered with new data technology and offerings, the most pressing problem data teams face today isn’t a lack of technology or skills; it’s not knowing how to create a modern data experience.

IT

IT Metadata Data Quality Data Lake

Unlock data across organizational boundaries using Amazon DataZone – now generally available

AWS Big Data

OCTOBER 4, 2023

Use business terms to search, share, and access cataloged data, making data accessible to all the configured users to learn more about data they want to use with the business glossary. Automate data discovery and cataloging with machine learning (ML).

Metadata

Metadata Data Lake Publishing Data Governance

Cloudera Named a Leader in the 2022 Gartner® Magic Quadrant™ for Cloud Database Management Systems (DBMS)

Cloudera

DECEMBER 16, 2022

Cloudera’s platform enables teams to burst compute intensive machine learning workloads to the cloud. Notably, these same services simplify repatriating data workloads back to private clouds, to save on cloud infrastructure expenses. Cloudera has long had the capabilities of a data lakehouse, if not the label.

Management

Management Metadata Machine Learning Data Lake

How to use foundation models and trusted governance to manage AI workflow risk

IBM Big Data Hub

OCTOBER 16, 2023

As more businesses use AI systems and the technology continues to mature and change, improper use could expose a company to significant financial, operational, regulatory and reputational risks. It includes processes that trace and document the origin of data, models and associated metadata and pipelines for audits.

Risk

Risk Modeling Management Metadata

Introducing hybrid access mode for AWS Glue Data Catalog to secure access using AWS Lake Formation and IAM and Amazon S3 policies

AWS Big Data

SEPTEMBER 26, 2023

AWS Lake Formation helps you centrally govern, secure, and globally share data for analytics and machine learning. You can use fine-grained data access control to verify that the right users have access to the right data down to the cell level of tables. Also, various use cases operate on the data lakes.

Data Lake

Data Lake Metadata Management Modeling

How gaming companies can use Amazon Redshift Serverless to build scalable analytical applications faster and easier

AWS Big Data

MARCH 7, 2023

It covers how to use a conceptual, logical architecture for some of the most popular gaming industry use cases like event analysis, in-game purchase recommendations, measuring player satisfaction, telemetry data analysis, and more. A data hub contains data at multiple levels of granularity and is often not integrated.

Analytics

Analytics Data Warehouse Data Lake Metadata

What Is a Data Catalog?

Alation

FEBRUARY 13, 2020

Why do we need a data catalog? What does a data catalog do? These are all good questions and a logical place to start your data cataloging journey. Data catalogs have become the standard for metadata management in the age of big data and self-service analytics. Figure 1 – Data Catalog Metadata Subjects.

Metadata

Metadata Data Lake Recreation/Entertainment Big Data

Gartner® Magic Quadrant™ for Cloud Database Report Recognizes Cloudera as a Visionary

Cloudera

JANUARY 19, 2022

Cloudera Data Platform (CDP) scored among the top 10 vendors on all four Analytical Use Cases — Data Warehouse, Logical Data Warehouse, Data Lake and Operational Intelligence in the Critical Capabilities for Cloud Database Management Systems for Analytics Use Cases. and/or its affiliates in the U.S.

Reporting

Reporting Data Warehouse Data Lake Machine Learning

Don’t Fear Artificial Intelligence; Embrace it Through Data Governance

CIO Business Intelligence

APRIL 29, 2022

As someone who is passionate about the transformative power of technology, it is fascinating to see intelligent computing – in all its various guises – bridge the schism between fantasy and reality. This first article emphasizes data as the ‘foundation-stone’ of AI-based initiatives. Establishing a Data Foundation.

Data Governance

Data Governance IT Risk Data Lake

How Fujitsu implemented a global data mesh architecture and democratized data

AWS Big Data

MAY 1, 2024

Solution overview OneData defines three personas: Publisher – This role includes the organizational and management team of systems that serve as data sources. Responsibilities include: Load raw data from the data source system at the appropriate frequency. Provide and keep up to date with technical metadata for loaded data.

Dashboards

Dashboards Data-driven Publishing Cost-Benefit

Unlocking the value of data as your differentiator

AWS Big Data

NOVEMBER 29, 2023

Today on the AWS re:Invent keynote stage, Swami Sivasubramanian, VP of Data and AI, AWS, spoke about the beneficial relationship among data, generative AI, and humans—all working together to unleash new possibilities in efficiency and creativity. There has never been a more exciting time in modern technology.

Data Warehouse

Data Warehouse Data Lake Data Integration Dashboards

Advancing AI: The emergence of a modern information lifecycle

CIO Business Intelligence

DECEMBER 4, 2023

A modern information lifecycle management approach Today’s ILM approach recognizes the enterprise value of all digitized and enriched assets , avoiding the habituated, narrow reliance ontraditional structured data. Beyond “records,” organizations can digitally capture anything and apply metadata for context and searchability.

Unstructured Data

Unstructured Data Data Lake Metadata Business Objectives

Accelerate Your Data Mesh in the Cloud with Cloudera Data Engineering and Modak NabuTM

Cloudera

OCTOBER 11, 2021

The platform converges data cataloging, data ingestion, data profiling, data tagging, data discovery, and data exploration into a unified platform, driven by metadata. Modak Nabu automates repetitive tasks in the data preparation process and thus accelerates the data preparation by 4x.

Data Lake

Data Lake Cost-Benefit Data-driven Dashboards

How Amazon Finance Automation built a data mesh to support distributed data ownership and centralize governance

AWS Big Data

JULY 14, 2023

In this post, we discuss how the Amazon Finance Automation team used AWS Lake Formation and the AWS Glue Data Catalog to build a data mesh architecture that simplified data governance at scale and provided seamless data access for analytics, AI, and machine learning (ML) use cases.

Finance

Finance Metadata Big Data Recreation/Entertainment

Exploring real-time streaming for generative AI Applications

AWS Big Data

MARCH 25, 2024

Foundation models (FMs) are large machine learning (ML) models trained on a broad spectrum of unlabeled and generalized datasets. Streaming jobs constantly ingest new data to synchronize across systems and can perform enrichment, transformations, joins, and aggregations across windows of time more efficiently.

Data Lake

Data Lake Unstructured Data Management Modeling

Power enterprise-grade Data Vaults with Amazon Redshift – Part 1

AWS Big Data

NOVEMBER 16, 2023

Amazon Redshift is a popular cloud data warehouse, offering a fully managed cloud-based service that seamlessly integrates with an organization’s Amazon Simple Storage Service (Amazon S3) data lake, real-time streams, machine learning (ML) workflows, transactional workflows, and much more—all while providing up to 7.9x

Enterprise

Enterprise Data Warehouse Data Lake Optimization

Federate Amazon QuickSight access with open-source identity provider Keycloak

AWS Big Data

JUNE 13, 2023

Amazon QuickSight is a scalable, serverless, embeddable, machine learning (ML) powered business intelligence (BI) service built for the cloud that supports identity federation in both Standard and Enterprise editions. Download the SAML metadata file. Save full code from saml-metadata.xml to your local machine.

Metadata

Metadata Dashboards Business Intelligence Management

Five benefits of a data catalog

IBM Big Data Hub

DECEMBER 16, 2022

For example, data catalogs have evolved to deliver governance capabilities like managing data quality and data privacy and compliance. It uses metadata and data management tools to organize all data assets within your organization. After all, Alex may not be aware of all the data available to her.

Metadata

Metadata Data Quality Data-driven Data Governance

What Is Data Curation?

Alation

FEBRUARY 13, 2020

Data curation is important in today’s world of data sharing and self-service analytics, but I think it is a frequently misused term. When speaking and consulting, I often hear people refer to data in their data lakes and data warehouses as curated data, believing that it is curated because it is stored as shareable data.

Metadata

Metadata Data Warehouse Data Lake Data Governance

Convergent Evolution

Peter James Thomas

AUGUST 18, 2018

That was the Science, here comes the Technology… A Brief Hydrology of Data Lakes. Overlapping with the above, from around 2012, I began to get involved in also designing and implementing Big Data Architectures; initially for narrow purposes and later Data Lakes spanning entire enterprises.

Data Lake

Data Lake Data Warehouse Data mining Statistics

Power enterprise-grade Data Vaults with Amazon Redshift – Part 2

AWS Big Data

NOVEMBER 16, 2023

Amazon Redshift is a popular cloud data warehouse, offering a fully managed cloud-based service that seamlessly integrates with an organization’s Amazon Simple Storage Service (Amazon S3) data lake, real-time streams, machine learning (ML) workflows, transactional workflows, and much more—all while providing up to 7.9x

Enterprise

Enterprise Data Warehouse Snapshot Cost-Benefit

Announcing the 2021 Data Impact Awards

Cloudera

MAY 12, 2021

Use cases could include but are not limited to: predictive maintenance, log data pipeline optimization, connected vehicles, industrial IoT, fraud detection, patient monitoring, network monitoring, and more. DATA FOR ENTERPRISE AI. SECURITY AND GOVERNANCE LEADERSHIP. INDUSTRY TRANSFORMATION. Last, but certainly not least.

Digital Transformation

Digital Transformation Machine Learning Optimization Data Lake

Use your corporate identities for analytics with Amazon EMR and AWS IAM Identity Center

AWS Big Data

APRIL 26, 2024

With trusted identity propagation, data access management can be based on a user’s corporate identity and can be propagated seamlessly as they access data with single sign-on to build analytics applications with Amazon EMR (EMR Studio and Amazon EMR on EC2). Select Named Data Catalog resources. Choose Grant.

Analytics

Analytics Data Lake Management Enterprise

Enabling Self-Service Business Insights with Cloudera Data Warehouse

Cloudera

JANUARY 11, 2021

In data-driven organizations, to fulfill its charter to democratize data and provide on-demand, quality computing services in a secure, compliant environment, IT must replace legacy approaches and update technologies. There needs to emerge data-first, self-service replacement for these old systems. billion dollars.’.

Data Warehouse

Data Warehouse Data Lake Cost-Benefit Machine Learning

Themes and Conferences per Pacoid, Episode 8

Domino Data Lab

APRIL 3, 2019

Data governance shows up as the fourth-most-popular kind of solution that enterprise teams were adopting or evaluating during 2019. That’s a lot of priorities – especially when you group together closely related items such as data lineage and metadata management which rank nearby. Some of the finer points seem amorphous at best.

Data Governance

Data Governance Machine Learning Metadata Big Data

Understanding the Differences Between Data Lakes and Data Warehouses

Choosing an open table format for your transactional data lake on AWS

Webinars

Trending Sources

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Webinars

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

Introducing MongoDB Atlas metadata collection with AWS Glue crawlers

How Ruparupa gained updated insights with an Amazon S3 data lake, AWS Glue, Apache Hudi, and Amazon QuickSight

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

Introducing Apache Hudi support with AWS Glue crawlers

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

Case study: Policy Enforcement Automation With Semantics

Educating ChatGPT on Data Lakehouse

What is a data architect? Skills, salaries, and how to become a data framework master

Breaking State and Local Data Silos with Modern Data Architectures

Habib Bank manages data at scale with Cloudera Data Platform

AWS Lake Formation 2022 year in review

How Amazon Devices scaled and optimized real-time demand and supply forecasts using serverless analytics

Implement a Multi-Cloud Open Lakehouse with Apache Iceberg in Cloudera Data Platform

Usability and Connecting Threads: How Data Fabric Makes Sense Out of Disparate Data

Four Topics That Should Be Top of Mind for SAP Partners

Data Mesh 101: What it is and Why You Should Care

Unlock data across organizational boundaries using Amazon DataZone – now generally available

Cloudera Named a Leader in the 2022 Gartner® Magic Quadrant™ for Cloud Database Management Systems (DBMS)

How to use foundation models and trusted governance to manage AI workflow risk

Introducing hybrid access mode for AWS Glue Data Catalog to secure access using AWS Lake Formation and IAM and Amazon S3 policies

How gaming companies can use Amazon Redshift Serverless to build scalable analytical applications faster and easier

What Is a Data Catalog?

Gartner® Magic Quadrant™ for Cloud Database Report Recognizes Cloudera as a Visionary

Don’t Fear Artificial Intelligence; Embrace it Through Data Governance

How Fujitsu implemented a global data mesh architecture and democratized data

Unlocking the value of data as your differentiator

Advancing AI: The emergence of a modern information lifecycle

Accelerate Your Data Mesh in the Cloud with Cloudera Data Engineering and Modak NabuTM

How Amazon Finance Automation built a data mesh to support distributed data ownership and centralize governance

Exploring real-time streaming for generative AI Applications

Power enterprise-grade Data Vaults with Amazon Redshift – Part 1

Federate Amazon QuickSight access with open-source identity provider Keycloak

Five benefits of a data catalog

What Is Data Curation?

Convergent Evolution

Power enterprise-grade Data Vaults with Amazon Redshift – Part 2

Announcing the 2021 Data Impact Awards

Use your corporate identities for analytics with Amazon EMR and AWS IAM Identity Center

Enabling Self-Service Business Insights with Cloudera Data Warehouse

Themes and Conferences per Pacoid, Episode 8

Stay Connected