Blog, Data Lake, Data Transformation and Metadata

Blog

Data Lake

Data Transformation

Metadata

Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation

AWS Big Data

JANUARY 17, 2024

With Amazon EMR 6.15, we launched AWS Lake Formation based fine-grained access controls (FGAC) on Open Table Formats (OTFs), including Apache Hudi, Apache Iceberg, and Delta lake. Many large enterprise companies seek to use their transactional data lake to gain insights and improve decision-making.

Data Lake

Data Lake Snapshot Big Data Data-driven

How to use foundation models and trusted governance to manage AI workflow risk

IBM Big Data Hub

OCTOBER 16, 2023

It includes processes that trace and document the origin of data, models and associated metadata and pipelines for audits. How to scale AL and ML with built-in governance A fit-for-purpose data store built on an open lakehouse architecture allows you to scale AI and ML while providing built-in governance tools.

Risk

Risk Modeling Management Metadata

Join 52,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

How to Optimize the Developer Experience for Monumental Impact

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

MORE WEBINARS

Trending Sources

Turnkey Cloud DataOps: Solution from Alation and Accenture

Alation

MARCH 22, 2022

More specifically, IDF has been integrated with Alation at an API level; this means that all generated pipeline code, metadata attributes, configuration files, and lineage are automatically synced (representing a huge time savings). They can better understand data transformations, checks, and normalization. Transparency is key.

Metadata

Metadata Cost-Benefit Data Quality Data Lake

Webinars

How to Optimize the Developer Experience for Monumental Impact

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

MORE WEBINARS

Data Preparation and Data Mapping: The Glue Between Data Management and Data Governance to Accelerate Insights and Reduce Risks

erwin

JANUARY 11, 2019

Organizations have spent a lot of time and money trying to harmonize data across diverse platforms , including cleansing, uploading metadata, converting code, defining business glossaries, tracking data transformations and so on. And there’s control of that landscape to facilitate insight and collaboration and limit risk.

Data Governance

Data Governance Risk Metadata Management

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. In this post, we describe Orca’s journey building a transactional data lake using Amazon Simple Storage Service (Amazon S3), Apache Iceberg, and AWS Analytics.

Data Lake

Data Lake Analytics Snapshot Optimization

Data platform trinity: Competitive or complementary?

IBM Big Data Hub

JANUARY 18, 2023

In another decade, the internet and mobile started the generate data of unforeseen volume, variety and velocity. It required a different data platform solution. Hence, Data Lake emerged, which handles unstructured and structured data with huge volume. Data lakehouse was created to solve these problems.

Data Lake

Data Lake Data Warehouse Data-driven Metadata

Cloudera’s Open Data Lakehouse Supercharged with dbt Core(tm)

Cloudera

OCTOBER 7, 2022

Using these adapters, Cloudera customers can use dbt to collaborate, test, deploy, and document their data transformation and analytic pipelines on CDP Public Cloud, CDP One, and CDP Private Cloud. The Open Data Lakehouse . This variety can result in a lack of standardization, leading to data duplication and inconsistency.

Data Warehouse

Data Warehouse Data Transformation Testing Data Lake

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

Cloudera

JUNE 30, 2022

These tools empower analysts and data scientists to easily collaborate on the same data, with their choice of tools and analytic engines. No more lock-in, unnecessary data transformations, or data movement across tools and clouds just to extract insights out of the data.

Data Lake

Data Lake Data Architecture Metadata Data Warehouse

Fabrics, Meshes & Stacks, oh my! Q&A with Sanjeev Mohan

Alation

AUGUST 11, 2022

Today, the brightest minds in our industry are targeting the massive proliferation of data volumes and the accompanying but hard-to-find value locked within all that data. We chatted about industry trends, why decentralization has become a hot topic in the data world, and how metadata drives many data-centric use cases.

Metadata

Metadata Data Warehouse Data Quality Data Lake

Tackling AI’s data challenges with IBM databases on AWS

IBM Big Data Hub

MARCH 14, 2024

This involves unifying and sharing a single copy of data and metadata across IBM® watsonx.data ™, IBM® Db2 ®, IBM® Db2® Warehouse and IBM® Netezza ®, using native integrations and supporting open formats, all without the need for migration or recataloging.

Cost-Benefit

Cost-Benefit Metadata Optimization Management

Exploring the AI and data capabilities of watsonx

IBM Big Data Hub

JULY 17, 2023

In this blog, I will cover: What is watsonx.ai? Capabilities within the Prompt Lab include: Summarize: Transform text with domain-specific content into personalized overviews and capture key points (e.g., Easy to use, integrated data console: Bring your own data and stay in control of your data.

Machine Learning

Machine Learning Data Warehouse Modeling Cost-Benefit

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

AWS Big Data

MARCH 3, 2023

Building data lakes from continuously changing transactional data of databases and keeping data lakes up to date is a complex task and can be an operational challenge. You can then apply transformations and store data in Delta format for managing inserts, updates, and deletes. For Type , choose Spark.

Data Lake

Data Lake Dashboards Metrics Metadata

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

AWS Big Data

NOVEMBER 29, 2023

dbt is an open source, SQL-first templating engine that allows you to write repeatable and extensible data transforms in Python and SQL. dbt is predominantly used by data warehouses (such as Amazon Redshift ) customers who are looking to keep their data transform logic separate from storage and engine.

Data Lake

Data Lake Management Metrics Data Warehouse

Empower your Jira data in a data lake with Amazon AppFlow and AWS Glue

AWS Big Data

AUGUST 1, 2023

Although Jira Cloud provides reporting capability, loading this data into a data lake will facilitate enrichment with other business data, as well as support the use of business intelligence (BI) tools and artificial intelligence (AI) and machine learning (ML) applications. For InitialRunFlag , choose Setup.

Data Lake

Data Lake Data Transformation Cost-Benefit Data-driven

Accelerate Your Data Mesh in the Cloud with Cloudera Data Engineering and Modak NabuTM

Cloudera

OCTOBER 11, 2021

The platform converges data cataloging, data ingestion, data profiling, data tagging, data discovery, and data exploration into a unified platform, driven by metadata. Modak Nabu automates repetitive tasks in the data preparation process and thus accelerates the data preparation by 4x.

Data Lake

Data Lake Cost-Benefit Data-driven Dashboards

How to modernize data lakes with a data lakehouse architecture

IBM Big Data Hub

JULY 5, 2023

Data Lakes have been around for well over a decade now, supporting the analytic operations of some of the largest world corporations. Such data volumes are not easy to move, migrate or modernize. The challenges of a monolithic data lake architecture Data lakes are, at a high level, single repositories of data at scale.

Data Lake

Data Lake Metadata Cost-Benefit Data Warehouse

Data Mesh 101: How Data Mesh Helps Organizations Be Data-Driven and Achieve Velocity

Ontotext

FEBRUARY 12, 2024

For many organizations, a centralized data platform will fall short as it gives data teams much less autonomy over managing increasingly diverse and voluminous datasets. A centralized data engineering team focuses on building a governed self-serviced infrastructure, while domain teams use the services to build full-stack data products.

Data-driven

Data-driven Data Lake Data Quality Business Objectives

Addressing the Three Scalability Challenges in Modern Data Platforms

Cloudera

NOVEMBER 22, 2021

In addition, data pipelines include more and more stages, thus making it difficult for data engineers to compile, manage, and troubleshoot those analytical workloads. As a result, alternative data integration technologies (e.g., CRM platforms).

Data Processing

Data Processing Data Warehouse Enterprise Visualization

Data Leaders Brief

Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation

How to use foundation models and trusted governance to manage AI workflow risk

Webinars

Trending Sources

Turnkey Cloud DataOps: Solution from Alation and Accenture

Webinars

Data Preparation and Data Mapping: The Glue Between Data Management and Data Governance to Accelerate Insights and Reduce Risks

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Data platform trinity: Competitive or complementary?

Cloudera’s Open Data Lakehouse Supercharged with dbt Core(tm)

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

Fabrics, Meshes & Stacks, oh my! Q&A with Sanjeev Mohan

Tackling AI’s data challenges with IBM databases on AWS

Exploring the AI and data capabilities of watsonx

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

Empower your Jira data in a data lake with Amazon AppFlow and AWS Glue

Accelerate Your Data Mesh in the Cloud with Cloudera Data Engineering and Modak NabuTM

How to modernize data lakes with a data lakehouse architecture

Data Mesh 101: How Data Mesh Helps Organizations Be Data-Driven and Achieve Velocity

Addressing the Three Scalability Challenges in Modern Data Platforms

Stay Connected