Data Lake, Data Processing, Enterprise and Metadata

Data Lake

Data Processing

Enterprise

Metadata

Migrate an existing data lake to a transactional data lake using Apache Iceberg

AWS Big Data

OCTOBER 3, 2023

A data lake is a centralized repository that you can use to store all your structured and unstructured data at any scale. You can store your data as-is, without having to first structure the data and then run different types of analytics for better business insights.

Data Lake

Data Lake Metadata Snapshot Recreation/Entertainment

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

AWS Big Data

MARCH 10, 2023

Since the deluge of big data over a decade ago, many organizations have learned to build applications to process and analyze petabytes of data. Data lakes have served as a central repository to store structured and unstructured data at any scale and in various formats.

Data Lake

Data Lake Sales Data Warehouse Snapshot

Join 52,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

How to Optimize the Developer Experience for Monumental Impact

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

How To Get Promoted In Product Management

MORE WEBINARS

Trending Sources

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

AWS Big Data

DECEMBER 13, 2023

For the past 5 years, BMS has used a custom framework called Enterprise Data Lake Services (EDLS) to create ETL jobs for business users. BMS’s EDLS platform hosts over 5,000 jobs and is growing at 15% YoY (year over year). It retrieves the specified files and available metadata to show on the UI.

Metadata

Metadata Data Lake Visualization Data Transformation

Webinars

How to Optimize the Developer Experience for Monumental Impact

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

How To Get Promoted In Product Management

MORE WEBINARS

Power enterprise-grade Data Vaults with Amazon Redshift – Part 2

AWS Big Data

NOVEMBER 16, 2023

Amazon Redshift is a popular cloud data warehouse, offering a fully managed cloud-based service that seamlessly integrates with an organization’s Amazon Simple Storage Service (Amazon S3) data lake, real-time streams, machine learning (ML) workflows, transactional workflows, and much more—all while providing up to 7.9x

Enterprise

Enterprise Data Warehouse Snapshot Cost-Benefit

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

AWS Big Data

MARCH 7, 2024

At the same time, they need to optimize operational costs to unlock the value of this data for timely insights and do so with a consistent performance. With this massive data growth, data proliferation across your data stores, data warehouse, and data lakes can become equally challenging.

Data Lake

Data Lake Analytics Dashboards Metrics

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics, Part 3: Visualization and trend analysis using Amazon QuickSight

AWS Big Data

MARCH 29, 2024

Data Firehose uses an AWS Lambda function to transform data and ingest the transformed records into an Amazon Simple Storage Service (Amazon S3) bucket. An AWS Glue crawler scans data on the S3 bucket and populates table metadata on the AWS Glue Data Catalog.

Metrics

Metrics Visualization Dashboards Interactive

How Data Governance Protects Sensitive Data

erwin

APRIL 2, 2021

With more companies increasingly migrating their data to the cloud to ensure availability and scalability, the risks associated with data management and protection also are growing. Data Security Starts with Data Governance. Do You Know Where Your Sensitive Data Is?

Data Governance

Data Governance Cost-Benefit Risk Metadata

Announcing the 2021 Data Impact Awards

Cloudera

MAY 12, 2021

2020 saw us hosting our first ever fully digital Data Impact Awards ceremony, and it certainly was one of the highlights of our year. We saw a record number of entries and incredible examples of how customers were using Cloudera’s platform and services to unlock the power of data. DATA FOR ENTERPRISE AI.

Digital Transformation

Digital Transformation Machine Learning Optimization Data Lake

What Is Alation Connected Sheets? Q&A with the Creators

Alation

NOVEMBER 28, 2022

Spreadsheet users can now pull high-quality data, with a view into its context and history, directly from Alation into Google Sheets. Talo: They say spreadsheets are “the dark matter” of the enterprise. Krishna: Spreadsheets are truly the dark matter of the data universe. What problems do spreadsheets create? Krishna: Great!

Metadata

Metadata Enterprise Cost-Benefit Finance

Themes and Conferences per Pacoid, Episode 8

Domino Data Lab

APRIL 3, 2019

It includes perspectives about current issues, themes, vendors, and products for data governance. My interest in data governance (DG) began with the recent industry surveys by O’Reilly Media about enterprise adoption of “ABC” (AI, Big Data, Cloud). for DG adoption in the enterprise. A brief pictorial history.

Data Governance

Data Governance Machine Learning Metadata Big Data

Extreme data center pressure? Burst to the cloud with CDP!

Cloudera

NOVEMBER 12, 2020

Typically, when we talk about data warehousing at an enterprise level on the cloud, one of the biggest concerns is that moving workloads from on-premises to the cloud is not seamless and opens up new risks for data safety and security. How Burst to Cloud can solve your data center pressure. More than likely it is.

Data Warehouse

Data Warehouse Reporting Risk Cost-Benefit

Use your corporate identities for analytics with Amazon EMR and AWS IAM Identity Center

AWS Big Data

APRIL 26, 2024

In this post, we show how to bring your workforce identity to EMR Studio for analytics use cases, directly manage fine-grained permissions for the corporate users and groups using Lake Formation, and audit their data access. Both enterprise users from Okta are provisioned in IAM Identity Center. Choose Create. Choose Grant.

Analytics

Analytics Data Lake Management Enterprise

Build efficient ETL pipelines with AWS Step Functions distributed map and redrive feature

AWS Big Data

DECEMBER 18, 2023

Solution overview One of the common functionalities involved in data pipelines is extracting data from multiple data sources and exporting it to a data lake or synchronizing the data to another database. There are multiple tables related to customers and order data in the RDS database.

Metadata

Metadata Visualization Data Lake Data-driven

Federate Amazon QuickSight access with open-source identity provider Keycloak

AWS Big Data

JUNE 13, 2023

Amazon QuickSight is a scalable, serverless, embeddable, machine learning (ML) powered business intelligence (BI) service built for the cloud that supports identity federation in both Standard and Enterprise editions. Download the SAML metadata file. In the navigation pane under Clients , import the SAML metadata file.

Metadata

Metadata Dashboards Business Intelligence Management

Get Your Analytics Insights Instantly – Without Abandoning Central IT

Cloudera

JANUARY 21, 2021

With CDW, as an integrated service of CDP, your line of business gets immediate resources needed for faster application launches and expedited data access, all while protecting the company’s multi-year investment in centralized data management, security, and governance. Proprietary file formats mean no one else is invited in!

Data Lake

Data Lake Data Warehouse IT Analytics

Create an end-to-end data strategy for Customer 360 on AWS

AWS Big Data

MARCH 26, 2024

Profile aggregation – When you’ve uniquely identified a customer, you can build applications in Managed Service for Apache Flink to consolidate all their metadata, from name to interaction history. Then, you transform this data into a concise format. Let’s find out what role each of these components play in the context of C360.

Data Strategy

Data Strategy Strategy Data Warehouse Prescriptive Analytics

CIOs rise to the ESG reporting challenge

CIO Business Intelligence

JANUARY 30, 2024

“Always the gatekeepers of much of the data necessary for ESG reporting, CIOs are finding that companies are even more dependent on them,” says Nancy Mentesana, ESG executive director at Labrador US, a global communications firm focused on corporate disclosure documents. The complexity is at a much higher level.”

Reporting

Reporting Data Quality Strategy Data-driven

Exploring the AI and data capabilities of watsonx

IBM Big Data Hub

JULY 17, 2023

is our enterprise-ready next-generation studio for AI builders, bringing together traditional machine learning (ML) and new generative AI capabilities powered by foundation models. With watsonx.ai, businesses can effectively train, validate, tune and deploy AI models with confidence and at scale across their enterprise. IBM watsonx.ai

Machine Learning

Machine Learning Data Warehouse Modeling Cost-Benefit

Improving Multi-tenancy with Virtual Private Clusters

Cloudera

JUNE 6, 2019

The typical Cloudera Enterprise Data Hub Cluster starts with a few dozen nodes in the customer’s datacenter hosting a variety of distributed services. Over time, workloads start processing more data, tenants start onboarding more workloads, and administrators (admins) start onboarding more tenants. With CM 6.2,

Metadata

Metadata Data Lake Optimization Strategy

Generate security insights from Amazon Security Lake data using Amazon OpenSearch Ingestion

AWS Big Data

AUGUST 28, 2023

By converting logs and events using Open Cybersecurity Schema Framework , an open standard for storing security events in a common and shareable format, Security Lake optimizes and normalizes your security data for analysis using your preferred analytics tool. For more information, refer to Lifecycle management in Security Lake.

Dashboards

Dashboards Visualization Metadata Management

Dancing with Elephants in 5 Easy Steps

Cloudera

AUGUST 21, 2020

There are now tens of thousands of instances of these Big Data platforms running in production around the world today, and the number is increasing every year. Many of them are increasingly deployed outside of traditional data centers in hosted, “cloud” environments. Big Data is an ecosystem as well as a philosophy.

Cost-Benefit

Cost-Benefit Big Data ROI Risk

How Novo Nordisk built distributed data governance and control at scale

AWS Big Data

APRIL 28, 2023

This is the second post of a three-part series detailing how Novo Nordisk , a large pharmaceutical enterprise, partnered with AWS Professional Services to build a scalable and secure data and analytics platform. The third post will show how end-users can consume data from their tool of choice, without compromising data governance.

Data Governance

Data Governance Management Data-driven Data Lake

Ingest, transform, and deliver events published by Amazon Security Lake to Amazon OpenSearch Service

AWS Big Data

JUNE 19, 2023

Security Lake automatically centralizes security data from cloud, on-premises, and custom sources into a purpose-built data lake stored in your account. With Security Lake, you can get a more complete understanding of your security data across your entire organization.

Publishing

Publishing Dashboards Visualization Management

Data Management Requirements for the Enterprise Data Lake

In(tegrate) the Clouds

MAY 1, 2016

SnapLogic published Eight Data Management Requirements for the Enterprise Data Lake. They are: Storage and Data Formats. Metadata and Governance. The company also recently hosted a webinar on Democratizing the Data Lake with Constellation Research and published 2 whitepapers from Mark Madsen.

Data Lake

Data Lake Enterprise Management Metadata

Build a data lake with Apache Flink on Amazon EMR

AWS Big Data

JANUARY 27, 2023

To build a data-driven business, it is important to democratize enterprise data assets in a data catalog. With a unified data catalog, you can quickly search datasets and figure out data schema, data format, and location. For metadata read/write, Flink has the catalog interface.

Data Lake

Data Lake Metadata Business Analysis Data-driven

PODCAST: Making AI Real – Episode 4: Unlocking the Value of Enterprise AI with Data Engineering Capabilities

bridgei2i

MARCH 3, 2021

Episode 4: Unlocking the Value of Enterprise AI with Data Engineering Capabilities. Unlocking the Value of Enterprise AI with Data Engineering Capabilities. Tune in to the podcast to know more about the evolving industry and how new technologies are transforming the enterprise AI landscape. PODCAST: Making AI Real.

Enterprise

Enterprise Digital Transformation Data-driven Interactive

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Cloudera

OCTOBER 7, 2021

Those decentralization efforts appeared under different monikers through time, e.g., data marts versus data warehousing implementations (a popular architectural debate in the era of structured data) then enterprise-wide data lakes versus smaller, typically BU-Specific, “data ponds”.

Metadata

Metadata Cost-Benefit Enterprise Interactive

Databricks’ Data+AI Summit 2022: A Show of Partner “Unity”

Alation

JULY 18, 2022

To ensure you can deliver on this world-changing vision of data, Alation helps you maximize the value of your data lake with integrations to the Unity catalog. A Giant Partnership and a Giants Game.

ROI

ROI Metadata Data Lake Digital Transformation

Data Governance for Dummies: Your Questions, Answered

Alation

FEBRUARY 17, 2023

This past week, I had the pleasure of hosting Data Governance for Dummies author Jonathan Reichental for a fireside chat , along with Denise Swanson , Data Governance lead at Alation. They may leverage a business-unit glossary and an additional “Enterprise” glossary to drive very tangible benefits. Where do you govern?

Data Governance

Data Governance Data Quality Metadata Cost-Benefit

Addressing the Three Scalability Challenges in Modern Data Platforms

Cloudera

NOVEMBER 22, 2021

In legacy analytical systems such as enterprise data warehouses, the scalability challenges of a system were primarily associated with computational scalability, i.e., the ability of a data platform to handle larger volumes of data in an agile and cost-efficient way. Introduction. CRM platforms).

Data Processing

Data Processing Data Warehouse Enterprise Visualization

The Gartner 2021 Leadership Vision for Data & Analytics Leaders Webinar Q&A

Andrew White

JANUARY 11, 2021

On January 4th I had the pleasure of hosting a webinar. It was titled, The Gartner 2021 Leadership Vision for Data & Analytics Leaders. This was for the Chief Data Officer, or head of data and analytics. What is your vision for D&A for small and medium enterprises? They have a different sweet spot.

Data Analytics

Data Analytics Analytics Data-driven Finance

What is Data Mapping?

Jet Global

FEBRUARY 23, 2024

An on-premise solution provides a high level of control and customization as it is hosted and managed within the organization’s physical infrastructure, but it can be expensive to set up and maintain. Source-to-target mapping integration tasks vary in complexity, depending on data hierarchy and structure.

Data Warehouse

Data Warehouse Reporting Data Transformation Sales

Data Leaders Brief

Migrate an existing data lake to a transactional data lake using Apache Iceberg

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

Webinars

Trending Sources

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

Webinars

Power enterprise-grade Data Vaults with Amazon Redshift – Part 2

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics, Part 3: Visualization and trend analysis using Amazon QuickSight

How Data Governance Protects Sensitive Data

Announcing the 2021 Data Impact Awards

What Is Alation Connected Sheets? Q&A with the Creators

Themes and Conferences per Pacoid, Episode 8

Extreme data center pressure? Burst to the cloud with CDP!

Use your corporate identities for analytics with Amazon EMR and AWS IAM Identity Center

Build efficient ETL pipelines with AWS Step Functions distributed map and redrive feature

Federate Amazon QuickSight access with open-source identity provider Keycloak

Get Your Analytics Insights Instantly – Without Abandoning Central IT

Create an end-to-end data strategy for Customer 360 on AWS

CIOs rise to the ESG reporting challenge

Exploring the AI and data capabilities of watsonx

Improving Multi-tenancy with Virtual Private Clusters

Generate security insights from Amazon Security Lake data using Amazon OpenSearch Ingestion

Dancing with Elephants in 5 Easy Steps

How Novo Nordisk built distributed data governance and control at scale

Ingest, transform, and deliver events published by Amazon Security Lake to Amazon OpenSearch Service

Data Management Requirements for the Enterprise Data Lake

Build a data lake with Apache Flink on Amazon EMR

PODCAST: Making AI Real – Episode 4: Unlocking the Value of Enterprise AI with Data Engineering Capabilities

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Databricks’ Data+AI Summit 2022: A Show of Partner “Unity”

Data Governance for Dummies: Your Questions, Answered

Addressing the Three Scalability Challenges in Modern Data Platforms

The Gartner 2021 Leadership Vision for Data & Analytics Leaders Webinar Q&A

What is Data Mapping?

Stay Connected