Remove Analytics Remove Blog Remove Data Processing Remove Metadata
article thumbnail

Top 10 Data Lineage Podcasts, Blogs, and Magazines

Octopai

Our list of Top 10 Data Lineage Podcasts, Blogs, and Websites To Follow in 2021. The host is Tobias Macey, an engineer with many years of experience. The particular episode we recommend looks at how WeWork struggled with understanding their data lineage so they created a metadata repository to increase visibility. Agile Data.

article thumbnail

Migrate an existing data lake to a transactional data lake using Apache Iceberg

AWS Big Data

You can store your data as-is, without having to first structure the data and then run different types of analytics for better business insights. Analytics use cases on data lakes are always evolving. In this method, the metadata are recreated in an isolated environment and colocated with the existing data files.

Data Lake 102
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Streaming Ingestion for Apache Iceberg With Cloudera Stream Processing

Cloudera

Recently, we announced enhanced multi-function analytics support in Cloudera Data Platform (CDP) with Apache Iceberg. Iceberg is a high-performance open table format for huge analytic data sets. To provide the CM host we can copy the FQDN of the node where Cloudera Manager is running.

Snapshot 114
article thumbnail

A Reference Architecture for the Cloudera Private Cloud Base Data Platform

Cloudera

This blog post provides an overview of best practice for the design and deployment of clusters incorporating hardware and operating system configuration, along with guidance for networking and security as well as integration with existing enterprise infrastructure. Best of CDH & HDP, with added analytic and platform features .

article thumbnail

Setting up and Getting Started with Cloudera’s New SQL AI Assistant

Cloudera

As described in our recent blog post , an SQL AI Assistant has been integrated into Hue with the capability to leverage the power of large language models (LLMs) for a number of SQL tasks. This blog post aims to help you understand what you can do to get started with generative AI assisted SQL using Hue image version ​​2023.0.16.0

article thumbnail

Upgrade Hortonworks Data Platform (HDP) to Cloudera Data Platform (CDP) Private Cloud Base

Cloudera

CDP is an easy, fast, and secure enterprise analytics and management platform with the following capabilities: Enables ingesting, managing, and delivering of any analytics workload from Edge to AI. Provides self-service access to integrated, multi-function analytics on centrally managed and secured business data.

Testing 101
article thumbnail

Data governance beyond SDX: Adding third party assets to Apache Atlas

Cloudera

In this blog, we’ll highlight the key CDP aspects that provide data governance and lineage and show how they can be extended to incorporate metadata for non-CDP systems from across the enterprise. Extending Atlas’ metadata model. The example 1_typedef-server.json describes the server typedef used in this blog. .