Data Lake, Interactive, Metadata and Publishing

Data Lake

Interactive

Metadata

Publishing

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

AWS Big Data

MARCH 10, 2023

Since the deluge of big data over a decade ago, many organizations have learned to build applications to process and analyze petabytes of data. Data lakes have served as a central repository to store structured and unstructured data at any scale and in various formats.

Data Lake

Data Lake Sales Data Warehouse Snapshot

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics, Part 3: Visualization and trend analysis using Amazon QuickSight

AWS Big Data

MARCH 29, 2024

QuickSight makes it straightforward for business users to visualize data in interactive dashboards and reports. Data Firehose uses an AWS Lambda function to transform data and ingest the transformed records into an Amazon Simple Storage Service (Amazon S3) bucket. Choose Publish dashboard.

Metrics

Metrics Visualization Dashboards Interactive

Join 52,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

How to Optimize the Developer Experience for Monumental Impact

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

MORE WEBINARS

Trending Sources

How gaming companies can use Amazon Redshift Serverless to build scalable analytical applications faster and easier

AWS Big Data

MARCH 7, 2023

A data hub contains data at multiple levels of granularity and is often not integrated. It differs from a data lake by offering data that is pre-validated and standardized, allowing for simpler consumption by users. Data hubs and data lakes can coexist in an organization, complementing each other.

Analytics

Analytics Data Warehouse Data Lake Metadata

Webinars

How to Optimize the Developer Experience for Monumental Impact

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

MORE WEBINARS

AWS Lake Formation 2022 year in review

AWS Big Data

JANUARY 31, 2023

We have collected some of the key talks and solutions on data governance, data mesh, and modern data architecture published and presented in AWS re:Invent 2022, and a few data lake solutions built by customers and AWS Partners for easy reference. Starting with Amazon EMR release 6.7.0,

Data Lake

Data Lake Data Governance Data Architecture Data Warehouse

What is Data Mesh?

Ontotext

NOVEMBER 16, 2023

In a data mesh, domains are represented by a node, which can be an operational data store (ODS), a data warehouse, or a data lake tailored to the domain’s requirements. Mesh emerges when teams use other domains’ data products and the domains communicate with others in a governed manner.

Metadata

Metadata Data-driven Data Quality Data Architecture

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

AWS Big Data

SEPTEMBER 13, 2023

A modern data architecture is an evolutionary architecture pattern designed to integrate a data lake, data warehouse, and purpose-built stores with a unified governance model. The company wanted the ability to continue processing operational data in the secondary Region in the rare event of primary Region failure.

Data Lake

Data Lake Data Processing Metadata Snapshot

Cloudera Data Warehouse Demonstrates Best-in-Class Cloud-Native Price-Performance

Cloudera

JANUARY 15, 2021

Cloudera Data Warehouse is a highly scalable service that marries the SQL engine technologies of Apache Impala and Apache Hive with cloud-native features to deliver best-in-class price-performance for users running data warehousing workloads in the cloud. But don’t just take our word for it. Benchmark Description.

Data Warehouse

Data Warehouse Cost-Benefit Consulting Interactive

Themes and Conferences per Pacoid, Episode 8

Domino Data Lab

APRIL 3, 2019

Data governance shows up as the fourth-most-popular kind of solution that enterprise teams were adopting or evaluating during 2019. That’s a lot of priorities – especially when you group together closely related items such as data lineage and metadata management which rank nearby. in lieu of simply landing in a data lake.

Data Governance

Data Governance Machine Learning Metadata Big Data

Data platform trinity: Competitive or complementary?

IBM Big Data Hub

JANUARY 18, 2023

In another decade, the internet and mobile started the generate data of unforeseen volume, variety and velocity. It required a different data platform solution. Hence, Data Lake emerged, which handles unstructured and structured data with huge volume. A data fabric is comprised of a network of data nodes (e.g.,

Data Lake

Data Lake Data Warehouse Data-driven Metadata

Data Mesh 101: What it is and Why You Should Care

Ontotext

FEBRUARY 12, 2024

In a centralized architecture, data is copied from source systems into a data lake or data warehouse to create a single source of truth serving analytics use cases. This quickly becomes difficult to scale with data discovery and data version issues, schema evolution, tight coupling, and a lack of semantic metadata.

IT Metadata Data Quality Data Lake

Federate Amazon QuickSight access with open-source identity provider Keycloak

AWS Big Data

JUNE 13, 2023

Administrators can publish QuickSight applications on the Keycloak Admin console. Download the SAML metadata file. In the navigation pane under Clients , import the SAML metadata file. Download the Keycloak IdP SAML metadata file from that URL location. Vamsi Bhadriraju is a Data Architect at AWS. Choose Browse.

Metadata

Metadata Dashboards Business Intelligence Management

Introducing watsonx: The future of AI for business

IBM Big Data Hub

MAY 9, 2023

As AI becomes more pervasive, businesses need to feel confident that their models can be relied upon not to “hallucinate” facts or use inappropriate language when interacting with customers. Through workload optimization an organization can reduce data warehouse costs by up to 50 percent by augmenting with this solution. [1]

Data Warehouse

Data Warehouse Cost-Benefit Machine Learning Modeling

Open Data Lakehouse powered by Iceberg for all your Data Warehouse needs

Cloudera

APRIL 3, 2023

In this blog, we will share with you in detail how Cloudera integrates core compute engines including Apache Hive and Apache Impala in Cloudera Data Warehouse with Iceberg. We will publish follow up blogs for other data services. Iceberg basics Iceberg is an open table format designed for large analytic workloads.

Data Warehouse

Data Warehouse Snapshot Metadata Cost-Benefit

AWS Glue streaming application to process Amazon MSK data using AWS Glue Schema Registry

AWS Big Data

JUNE 12, 2023

After the processed data is stored in Amazon S3, we create an AWS Glue crawler to create a Data Catalog table that acts as a metadata layer for the data. The table can be queried using Amazon Athena , a serverless, interactive query service that enables running SQL-like queries on data stored in Amazon S3.

Management

Management Metadata Testing Internet of Things

Generate security insights from Amazon Security Lake data using Amazon OpenSearch Ingestion

AWS Big Data

AUGUST 28, 2023

Optionally, specify the Amazon S3 storage class for the data in Amazon Security Lake. For more information, refer to Lifecycle management in Security Lake. Review the details and create the data lake. For example, choosing DNS Activity will give you dashboards of all DNS activity published in Amazon Security Lake.

Dashboards

Dashboards Visualization Metadata Management

Exploring the AI and data capabilities of watsonx

IBM Big Data Hub

JULY 17, 2023

While they require task-specific labeled data for fine tuning, they also offer clients the best cost performance trade-off for non-generative use cases. offers a Prompt Lab, where users can interact with different prompts using prompt engineering on generative AI models for both zero-shot prompting and few-shot prompting.

Machine Learning

Machine Learning Data Warehouse Modeling Cost-Benefit

Analyze Amazon S3 storage costs using AWS Cost and Usage Reports, Amazon S3 Inventory, and Amazon Athena

AWS Big Data

FEBRUARY 2, 2023

Since its launch in 2006, Amazon Simple Storage Service (Amazon S3) has experienced major growth, supporting multiple use cases such as hosting websites, creating data lakes, serving as object storage for consumer applications, storing logs, and archiving data. This could be your data lake or application S3 bucket.

Reporting

Reporting Data Lake Management Optimization

Themes and Conferences per Pacoid, Episode 12

Domino Data Lab

AUGUST 8, 2019

Rather than publish an agenda as most technology conferences do, why not let people mingle, discuss, and propose topics and possible sessions? I mention this here because there was a lot of overlap between current industry data governance needs and what the scientific community is working toward for scholarly infrastructure.

Data Science

Data Science Machine Learning Data Governance Statistics

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

AWS Big Data

NOVEMBER 29, 2023

We have seen a strong customer demand to expand its scope to cloud-based data lakes because data lakes are increasingly the enterprise solution for large-scale data initiatives due to their power and capabilities. Let’s say that this company is located in Europe and the data product must comply with the GDPR.

Data Lake

Data Lake Management Metrics Data Warehouse

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Cloudera

OCTOBER 7, 2021

Those decentralization efforts appeared under different monikers through time, e.g., data marts versus data warehousing implementations (a popular architectural debate in the era of structured data) then enterprise-wide data lakes versus smaller, typically BU-Specific, “data ponds”.

Metadata

Metadata Cost-Benefit Enterprise Interactive

Use the Amazon Redshift Data API to interact with Amazon Redshift Serverless

AWS Big Data

APRIL 28, 2023

As a data engineer or application developer, for some use cases, you want to interact with the Redshift Serverless data warehouse to load or query data with a simple API endpoint without having to manage persistent connections. describe-table Describes the detailed information about a table including column metadata.

Interactive

Interactive Metadata Data Warehouse Data-driven

The Enduring Significance of Data Modeling in the Modern Data-Driven Enterprise

erwin

AUGUST 31, 2023

It delivers the ability to capture and unify the business and technical perspectives of data assets, enables effective collaboration between a variety of stakeholders, and delivers metadata-driven automation to accelerate the creation and maintenance of data sources on virtually any data management platform.

Data-driven

Data-driven Modeling Enterprise Structured Data

How Amazon Finance Automation built a data mesh to support distributed data ownership and centralize governance

AWS Big Data

JULY 14, 2023

This enabled producers to publish data products that were curated and authoritative assets for their domain. For example, the AR team created and governed their cash application dataset in their AWS account AWS Glue Data Catalog. Data source locations are registered with Lake Formation.

Finance

Finance Metadata Big Data Recreation/Entertainment

The Gartner 2021 Leadership Vision for Data & Analytics Leaders Webinar Q&A

Andrew White

JANUARY 11, 2021

This was for the Chief Data Officer, or head of data and analytics. Gartner also published the same piece of research for other roles, such as Application and Software Engineering. Does Data warehouse as a software tool will play role in future of Data & Analytics strategy? We have published some case studies.

Data Analytics

Data Analytics Analytics Data-driven Finance

What is Data Mapping?

Jet Global

FEBRUARY 23, 2024

Source-to-target mapping integration tasks vary in complexity, depending on data hierarchy and structure. Business applications use metadata and semantic rules to ensure seamless data transfer without loss. The company employs data mapping to align customer information from different sources (e.g.,

Data Warehouse

Data Warehouse Reporting Data Transformation Sales

Data Leaders Brief

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics, Part 3: Visualization and trend analysis using Amazon QuickSight

Webinars

Trending Sources

How gaming companies can use Amazon Redshift Serverless to build scalable analytical applications faster and easier

Webinars

AWS Lake Formation 2022 year in review

What is Data Mesh?

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

Cloudera Data Warehouse Demonstrates Best-in-Class Cloud-Native Price-Performance

Themes and Conferences per Pacoid, Episode 8

Data platform trinity: Competitive or complementary?

Data Mesh 101: What it is and Why You Should Care

Federate Amazon QuickSight access with open-source identity provider Keycloak

Introducing watsonx: The future of AI for business

Open Data Lakehouse powered by Iceberg for all your Data Warehouse needs

AWS Glue streaming application to process Amazon MSK data using AWS Glue Schema Registry

Generate security insights from Amazon Security Lake data using Amazon OpenSearch Ingestion

Exploring the AI and data capabilities of watsonx

Analyze Amazon S3 storage costs using AWS Cost and Usage Reports, Amazon S3 Inventory, and Amazon Athena

Themes and Conferences per Pacoid, Episode 12

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Use the Amazon Redshift Data API to interact with Amazon Redshift Serverless

The Enduring Significance of Data Modeling in the Modern Data-Driven Enterprise

How Amazon Finance Automation built a data mesh to support distributed data ownership and centralize governance

The Gartner 2021 Leadership Vision for Data & Analytics Leaders Webinar Q&A

What is Data Mapping?

Stay Connected