Blog - Data Leaders Brief

machine-learning-data-catalog

Blog

How Salesforce optimized their detection and response platform using AWS managed services

AWS Big Data

APRIL 18, 2024

This is a guest blog post co-authored with Atul Khare and Bhupender Panwar from Salesforce. The platform ingests more than 1 PB of data per day, more than 10 million events per second, and more than 200 different log types. The data lake consumers then use Apache Presto running on Amazon EMR cluster to perform one-time queries.

Optimization

Optimization Data Lake Management Key Performance Indicator

Are You Content with Your Organization’s Content Strategy?

Rocket-Powered Data Science

JULY 6, 2021

If you include the title of this blog, you were just presented with 13 examples of heteronyms in the preceding paragraphs. Specifically, in the modern era of massive data collections and exploding content repositories, we can no longer simply rely on keyword searches to be sufficient. Data catalogs are very useful and important.

Strategy

Strategy Machine Learning Metadata Knowledge Discovery

Join 52,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Communication

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Manufacturing Sustainability Surge: Your Guide to Data-Driven Energy Optimization & Decarbonization

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Understanding User Needs and Satisfying Them

MORE WEBINARS

Trending Sources

Automate AWS Clean Rooms querying and dashboard publishing using AWS Step Functions and Amazon QuickSight – Part 2

AWS Big Data

FEBRUARY 12, 2024

Public health organizations need access to data insights that they can quickly act upon, especially in times of health emergencies, when data needs to be updated multiple times daily. Instead, they rely on up-to-date dashboards that help them visualize data insights to make informed decisions quickly.

Publishing

Publishing Dashboards Metadata Visualization

Webinars

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Communication

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Manufacturing Sustainability Surge: Your Guide to Data-Driven Energy Optimization & Decarbonization

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Understanding User Needs and Satisfying Them

MORE WEBINARS

Business Strategies for Deploying Disruptive Tech: Generative AI and ChatGPT

Rocket-Powered Data Science

FEBRUARY 15, 2023

Third, any commitment to a disruptive technology (including data-intensive and AI implementations) must start with a business strategy. These changes may include requirements drift, data drift, model drift, or concept drift. I suggest that the simplest business strategy starts with answering three basic questions: What?

Strategy

Strategy Experimentation Uncertainty Machine Learning

Introducing Apache Hudi support with AWS Glue crawlers

AWS Big Data

NOVEMBER 22, 2023

Apache Hudi is an open table format that brings database and data warehouse capabilities to data lakes. Apache Hudi helps data engineers manage complex challenges, such as managing continuously evolving datasets with transactions while maintaining query performance.

Data Lake

Data Lake Snapshot Metadata Optimization

The DataOps Vendor Landscape, 2021

DataKitchen

APRIL 13, 2021

Read the complete blog below for a more detailed description of the vendors and their capabilities. This is not surprising given that DataOps enables enterprise data teams to generate significant business value from their data. Testing and Data Observability. Reflow — A system for incremental data processing in the cloud.

Testing

Testing Machine Learning Consulting Data Quality

Building a Machine Learning Application With Cloudera Data Science Workbench And Operational Database, Part 1: The Set-Up & Basics

Cloudera

JANUARY 6, 2021

Python is used extensively among Data Engineers and Data Scientists to solve all sorts of problems from ETL/ELT pipelines to building machine learning models. Apache HBase is an effective data storage system for many workflows but accessing this data specifically through Python can be a struggle. builder.

Machine Learning

Machine Learning Data Science Management Enterprise

Top 10 Metadata Management Influencers, Sites, and Blogs You Must Follow in 2021

Octopai

APRIL 19, 2021

Aptly named, metadata management is the process in which BI and Analytics teams manage metadata, which is the data that describes other data. In other words, data is the context and metadata is the content. Without metadata, BI teams are unable to understand the data’s full story. TDWI – David Loshin.

Metadata

Metadata Management Business Intelligence Data Governance

Using AWS AppSync and AWS Lake Formation to access a secure data lake through a GraphQL API

AWS Big Data

OCTOBER 9, 2023

Data lakes have been gaining popularity for storing vast amounts of data from diverse sources in a scalable and cost-effective way. As the number of data consumers grows, data lake administrators often need to implement fine-grained access controls for different user profiles.

Data Lake

Data Lake Testing Big Data Management

How Cargotec uses metadata replication to enable cross-account data sharing

AWS Big Data

JUNE 7, 2023

This is a guest blog post co-written with Sumesh M R from Cargotec and Tero Karttunen from Knowit Finland. Cargotec captures terabytes of IoT telemetry data from their machinery operated by numerous customers across the globe. Cargotec’s use cases also required them to create views that span tables and views across catalogs.

Metadata

Metadata Data Lake Machine Learning Big Data

Best Practices for Metadata Management

Alation

JULY 19, 2021

Metadata is information about data. A clothing catalog or dictionary are both examples of metadata repositories. Indeed, a popular online catalog, like Amazon, offers rich metadata around products to guide shoppers: ratings, reviews, and product details are all examples of metadata. What Is Metadata? Why Is Metadata Important?

Metadata

Metadata Management Data Governance Machine Learning

The People’s Data Catalog: Alation Featured as Top Choice in Eckerson’s Latest Report

Alation

JULY 15, 2021

Many data catalog initiatives fail. How can prospective buyers ensure they partner with the right catalog to drive success? According to the latest report from Eckerson Group, Deep Dive on Data Catalogs , shoppers must match the goals of their organizations to the capabilities of their chosen catalog.

Reporting

Reporting Data Governance Recreation/Entertainment Metadata

Simplify data transfer: Google BigQuery to Amazon S3 using Amazon AppFlow

AWS Big Data

OCTOBER 5, 2023

In today’s data-driven world, the ability to effortlessly move and analyze data across diverse platforms is essential. Amazon AppFlow , a fully managed data integration service, has been at the forefront of streamlining data transfer between AWS services, software as a service (SaaS) applications, and now Google BigQuery.

Data Warehouse

Data Warehouse Machine Learning Data Integration Data-driven

How Cloudera Supports Zero Trust for Data

Cloudera

JUNE 7, 2023

The revised ZTMM is organized by five categories or pillars: identity, devices, networks, applications and workloads, and data, and four levels of maturity: traditional, initial, advanced, and optimal. The data is protected but it is also accessible by the people who need it. How does Cloudera support the evolution to optimal?

Metadata

Metadata Data Lake Optimization Modeling

StormForge Optimize Live now available in the IBM Cloud Catalog

IBM Big Data Hub

AUGUST 8, 2023

Next, machine learning capabilities play a vital role in optimizing resource allocations by analyzing historical data and predicting future resource demands for each deployment. Now available in the IBM Cloud catalog Sign up for a 30-day free trial of StormForge Optimize Live to get started.

Optimization

Optimization Cost-Benefit Machine Learning ROI

How to Deliver Data Quality with Data Governance: Ryan Doupe, CDO of American Fidelity, 9-Step Process

Alation

JANUARY 20, 2022

Several weeks ago (prior to the Omicron wave), I got to attend my first conference in roughly two years: Dataversity’s Data Quality and Information Quality Conference. Ryan Doupe, Chief Data Officer of American Fidelity, held a thought-provoking session that resonated with me. Instead, data quality rules promote awareness and trust.

Data Quality

Data Quality Data Governance Metrics Statistics

Unlock data across organizational boundaries using Amazon DataZone – now generally available

AWS Big Data

OCTOBER 4, 2023

Amazon DataZone enables customers to discover, access, share, and govern data at scale across organizational boundaries, reducing the undifferentiated heavy lifting of making data and analytics tools accessible to everyone in the organization. This is challenging because access to data is managed differently by each of the tools.

Metadata

Metadata Data Lake Publishing Data Governance

What Is Digital Transformation?

Alation

NOVEMBER 4, 2022

In business, data-based goals tend to be very tangible. Perhaps you want to boost your ROI or CAGR, or reduce the time your analysts spend accessing and leveraging data. In this blog we’ll describe digital transformation: how it can be achieved as well as how it can benefit your business. What data is being used?

Digital Transformation

Digital Transformation Cost-Benefit Insurance Machine Learning

One Big Cluster Stuck: The Right Tool for the Right Job

Cloudera

JUNE 26, 2023

Here are some tips and tricks of the trade to prevent well-intended yet inappropriate data engineering and data science activities from cluttering or crashing the cluster. For data engineering and data science teams, CDSW is highly effective as a comprehensive platform that trains, develops, and deploys machine learning models.

Testing

Testing Data Processing Visualization Data Science

Introducing hybrid access mode for AWS Glue Data Catalog to secure access using AWS Lake Formation and IAM and Amazon S3 policies

AWS Big Data

SEPTEMBER 26, 2023

AWS Lake Formation helps you centrally govern, secure, and globally share data for analytics and machine learning. With Lake Formation, you can manage access control for your data lake data in Amazon Simple Storage Service (Amazon S3 ) and its metadata in AWS Glue Data Catalog in one place with familiar database-style features.

Data Lake

Data Lake Metadata Management Modeling

What is a Data Mesh?

DataKitchen

AUGUST 3, 2021

The data mesh design pattern breaks giant, monolithic enterprise data architectures into subsystems or domains, each managed by a dedicated team. DataOps helps the data mesh deliver greater business agility by enabling decentralized domains to work in concert. . But first, let’s define the data mesh design pattern.

Data Architecture

Data Architecture Data Lake Cost-Benefit Data Warehouse

Empower your Jira data in a data lake with Amazon AppFlow and AWS Glue

AWS Big Data

AUGUST 1, 2023

Although Jira Cloud provides reporting capability, loading this data into a data lake will facilitate enrichment with other business data, as well as support the use of business intelligence (BI) tools and artificial intelligence (AI) and machine learning (ML) applications.

Data Lake

Data Lake Data Transformation Cost-Benefit Data-driven

Gartner Data & Analytics London: Human Curation + Machine Learning

Alation

FEBRUARY 13, 2020

Earlier this month in London, more than 1,600 data and analytics leaders and professionals gathered for the Gartner Data & Analytics Summit. From niche breakout sessions to the packed opening keynote—where “AI” was one of three leading trends along with “data driven” and “privacy”— AI was everywhere. The automate!

Machine Learning

Machine Learning Data Analytics Analytics Metadata

Alation 2022.3: Alation Anywhere Connecting the Modern Data Stack

Alation

AUGUST 30, 2022

we are introducing Alation Anywhere, extending data intelligence directly to the tools in your modern data stack, starting with Tableau. We continue to make deep investments in governance, including new capabilities in the Stewardship Workbench, a core part of the Data Governance App. Then Alation came along.

Metadata

Metadata Data Quality Data Governance Machine Learning

What Is a Data Catalog?

Alation

FEBRUARY 13, 2020

Data catalogs have quickly become a core component of modern data management. Organizations with successful data catalog implementations see remarkable changes in the speed and quality of data analysis, and in the engagement and enthusiasm of people who need to perform data analysis.

Metadata

Metadata Data Lake Recreation/Entertainment Big Data

Data Intelligence + Human Brilliance = The Future of Innovation

Alation

SEPTEMBER 16, 2021

Today, Alation is the leading data catalog. Early on, we focused on three goals: Ease of deployment … so you can launch the catalog in days… Human-centric search … and start using it to find great data instantly… Usage and collaboration … and collaborate to leverage new insights fast.

Machine Learning

Machine Learning Data Governance Software Metadata

Educating ChatGPT on Data Lakehouse

Cloudera

MARCH 17, 2023

As the use of ChatGPT becomes more prevalent, I frequently encounter customers and data users citing ChatGPT’s responses in their discussions. I love the enthusiasm surrounding ChatGPT and the eagerness to learn about modern data architectures such as data lakehouses, data meshes, and data fabrics.

Unstructured Data

Unstructured Data Data Lake Data Warehouse Machine Learning

Migrate an existing data lake to a transactional data lake using Apache Iceberg

AWS Big Data

OCTOBER 3, 2023

A data lake is a centralized repository that you can use to store all your structured and unstructured data at any scale. You can store your data as-is, without having to first structure the data and then run different types of analytics for better business insights. Analytics use cases on data lakes are always evolving.

Data Lake

Data Lake Metadata Snapshot Recreation/Entertainment

Cloudera Named Leader in The Forrester Wave: Notebook-Based Predictive Analytics and Machine Learning, Q3 2020

Cloudera

SEPTEMBER 10, 2020

Cloudera has been named a Leader in The Forrester Wave : Notebook-Based Predictive Analytics and Machine Learning, Q3 2020. For enterprise machine learning teams, this means having the right platform, tools, and processes that streamline end-to-end ML to tackle once-impossible business challenges effectively and at scale.

Machine Learning

Machine Learning Predictive Analytics Analytics Enterprise

Build or Buy an Enterprise Data Catalog: Top 6 Considerations

Alation

FEBRUARY 12, 2020

Hundreds of data sources. Hundreds (even thousands) of data consumers. To keep up with the rapid influx of data, the many disparate data environments, and the rise in self-service analytics users, enterprises need an enterprise data catalog to drive the business forward with data, and ensure compliant, accurate data use.

Enterprise

Enterprise Metadata Machine Learning Reporting

AWS Glue Data Quality is Generally Available

AWS Big Data

JUNE 6, 2023

We are excited to announce the General Availability of AWS Glue Data Quality. Our journey started by working backward from our customers who create, manage, and operate data lakes and data warehouses for analytics and machine learning. It takes days for data engineers to identify and implement data quality rules.

Data Quality

Data Quality Statistics Data Lake Visualization

3x better performance with CDP Data Warehouse compared to EMR in TPC-DS benchmark

Cloudera

DECEMBER 11, 2020

In a previous blog post on CDW performance, we compared Azure HDInsight to CDW. In this blog post, we compare Cloudera Data Warehouse (CDW) on Cloudera Data Platform (CDP) using Apache Hive-LLAP to EMR 6.0 (also powered by Apache Hive-LLAP) on Amazon using the TPC-DS 2.9 More on this later in the blog.

Data Warehouse

Data Warehouse Metadata Machine Learning Measurement

How to handle a ransomware attack

IBM Big Data Hub

JANUARY 22, 2024

Over 17 percent of all cyberattacks involve ransomware —a type of malware that keeps a victim’s data or device locked unless the victim pays the hacker a ransom. Because many new types of ransomware target backups to make recovery harder, keep data backups offline. The first thing to keep in mind is you’re not alone.

Consulting

Consulting Behavioral Analytics Reporting Insurance

Use IAM runtime roles with Amazon EMR Studio Workspaces and AWS Lake Formation for cross-account fine-grained access control

AWS Big Data

NOVEMBER 6, 2023

Amazon EMR Studio is an integrated development environment (IDE) that makes it straightforward for data scientists and data engineers to develop, visualize, and debug data engineering and data science applications written in R, Python, Scala, and PySpark. We’re happy to introduce runtime roles for EMR Studio Workspaces.

Data Lake

Data Lake Sales Management Testing

Self-Serve Data Prep CAN Be Easy AND Sophisticated!

Smarten

AUGUST 4, 2023

The Right Self-Serve Data Preparation Solution is Sophisticated, Easy-to-Use and Ensures User Adoption! If your team has easy-to-use tools and features, you are much more likely to experience the user adoption you want and to improve data literacy and data democratization across the organization.

Data Lake

Data Lake Machine Learning Data Integration Optimization

Harmonize data using AWS Glue and AWS Lake Formation FindMatches ML to build a customer 360 view

AWS Big Data

JUNE 26, 2023

In today’s digital world, data is generated by a large number of disparate sources and growing at an exponential rate. Companies are faced with the daunting task of ingesting all this data, cleansing it, and using it to provide outstanding customer experience. It’s commonly referred to as a data harmonization or deduplication problem.

Insurance

Insurance Visualization Data Lake Metrics

Fraud Detection With Cloudera Stream Processing Part 2: Real-Time Streaming Analytics

Cloudera

JULY 18, 2022

In part 1 of this blog we discussed how Cloudera DataFlow for the Public Cloud (CDF-PC), the universal data distribution service powered by Apache NiFi, can make it easy to acquire data from wherever it originates and move it efficiently to make it available to other applications in a streaming fashion. Data decays!

Analytics

Analytics Dashboards Statistics Visualization

The Role of the Data Catalog in Data Security

Alation

JUNE 14, 2021

The Role of Catalog in Data Security. Recently, I dug in with CIOs on the topic of data security. Recently, I dug in with CIOs on the topic of data security. What came as no surprise was the importance CIOs place on taking a broader approach to data protection. The Role of the CISO in Data Governance and Security.

Data Governance

Data Governance Recreation/Entertainment Data Lake Digital Transformation

How a data fabric overcomes data sprawls to reduce time to insights

IBM Big Data Hub

APRIL 28, 2022

Data agility, the ability to store and access your data from wherever makes the most sense, has become a priority for enterprises in an increasingly distributed and complex environment. That’s where the data fabric comes in. Data fabric in action: Retail supply chain example. enterprises to minimize their time to value.

Metadata

Metadata Data Warehouse Forecasting Predictive Modeling

Maximize your data dividends with active metadata

IBM Big Data Hub

NOVEMBER 28, 2022

Metadata management performs a critical role within the modern data management stack. It helps blur data silos, and empowers data and analytics teams to better understand the context and quality of data. This, in turn, builds trust in data and the decision-making to follow. Improve data discovery.

Metadata

Metadata Data Quality Data-driven Data Governance

Using other CDP services with Cloudera Operational Database

Cloudera

FEBRUARY 16, 2021

In the previous blog post , we looked at some of the application development concepts for the Cloudera Operational Database (COD). In this blog post, we’ll see how you can use other CDP services with COD. Integrated across the Enterprise Data Lifecycle . Cloudera Data Engineering to ingest bulk data and data from mainframes.

Machine Learning

Machine Learning Data Lake Enterprise Data Warehouse

Our Next Phase of Growth: Enterprise Data Catalogs

Alation

FEBRUARY 13, 2020

Today, we’re announcing that Alation has closed a $50 million Series C funding led by Sapphire Ventures, with participation from new investor Salesforce Ventures and our existing investors Costanoa Ventures, DCVC (Data Collective), Harmony Partners and Icon Ventures. And, the data catalog market has had a year of incredible growth.

Enterprise

Enterprise Data Lake Machine Learning Data-driven

Enabling The Full ML Lifecycle For Scaling AI Use Cases

Cloudera

DECEMBER 17, 2020

When it comes to machine learning (ML) in the enterprise, there are many misconceptions about what it actually takes to effectively employ machine learning models and scale AI use cases. Accelerating the Full Machine Learning Lifecycle With Cloudera Data Platform.

Machine Learning

Machine Learning Visualization Data Science Optimization

Automating Model Risk Compliance: Model Development

DataRobot Blog

MAY 10, 2022

Addressing the Key Mandates of a Modern Model Risk Management Framework (MRM) When Leveraging Machine Learning . No longer is the modeler only limited to using linear models; they may now make use of varied data sources (both structured and unstructured) to build significantly higher performing models to power business processes.

Risk

Risk Modeling Machine Learning Data Quality

How Salesforce optimized their detection and response platform using AWS managed services

Are You Content with Your Organization’s Content Strategy?

Webinars

Trending Sources

Automate AWS Clean Rooms querying and dashboard publishing using AWS Step Functions and Amazon QuickSight – Part 2

Webinars

Business Strategies for Deploying Disruptive Tech: Generative AI and ChatGPT

Introducing Apache Hudi support with AWS Glue crawlers

The DataOps Vendor Landscape, 2021

Building a Machine Learning Application With Cloudera Data Science Workbench And Operational Database, Part 1: The Set-Up & Basics

Top 10 Metadata Management Influencers, Sites, and Blogs You Must Follow in 2021

Using AWS AppSync and AWS Lake Formation to access a secure data lake through a GraphQL API

How Cargotec uses metadata replication to enable cross-account data sharing

Best Practices for Metadata Management

The People’s Data Catalog: Alation Featured as Top Choice in Eckerson’s Latest Report

Simplify data transfer: Google BigQuery to Amazon S3 using Amazon AppFlow

How Cloudera Supports Zero Trust for Data

StormForge Optimize Live now available in the IBM Cloud Catalog

How to Deliver Data Quality with Data Governance: Ryan Doupe, CDO of American Fidelity, 9-Step Process

Unlock data across organizational boundaries using Amazon DataZone – now generally available

What Is Digital Transformation?

One Big Cluster Stuck: The Right Tool for the Right Job

Introducing hybrid access mode for AWS Glue Data Catalog to secure access using AWS Lake Formation and IAM and Amazon S3 policies

What is a Data Mesh?

Empower your Jira data in a data lake with Amazon AppFlow and AWS Glue

Gartner Data & Analytics London: Human Curation + Machine Learning

Alation 2022.3: Alation Anywhere Connecting the Modern Data Stack

What Is a Data Catalog?

Data Intelligence + Human Brilliance = The Future of Innovation

Educating ChatGPT on Data Lakehouse

Migrate an existing data lake to a transactional data lake using Apache Iceberg

Cloudera Named Leader in The Forrester Wave: Notebook-Based Predictive Analytics and Machine Learning, Q3 2020

Build or Buy an Enterprise Data Catalog: Top 6 Considerations

AWS Glue Data Quality is Generally Available

3x better performance with CDP Data Warehouse compared to EMR in TPC-DS benchmark

How to handle a ransomware attack

Use IAM runtime roles with Amazon EMR Studio Workspaces and AWS Lake Formation for cross-account fine-grained access control

Self-Serve Data Prep CAN Be Easy AND Sophisticated!

Harmonize data using AWS Glue and AWS Lake Formation FindMatches ML to build a customer 360 view

Fraud Detection With Cloudera Stream Processing Part 2: Real-Time Streaming Analytics

The Role of the Data Catalog in Data Security

How a data fabric overcomes data sprawls to reduce time to insights

Maximize your data dividends with active metadata

Using other CDP services with Cloudera Operational Database

Our Next Phase of Growth: Enterprise Data Catalogs

Enabling The Full ML Lifecycle For Scaling AI Use Cases

Automating Model Risk Compliance: Model Development

Stay Connected