Dashboards, Data Lake and Metadata

Migrate an existing data lake to a transactional data lake using Apache Iceberg

AWS Big Data

OCTOBER 3, 2023

A data lake is a centralized repository that you can use to store all your structured and unstructured data at any scale. You can store your data as-is, without having to first structure the data and then run different types of analytics for better business insights.

Data Lake

Data Lake Metadata Snapshot Recreation/Entertainment

Use Apache Iceberg in a data lake to support incremental data processing

AWS Big Data

MARCH 2, 2023

Apache Iceberg is an open table format for very large analytic datasets, which captures metadata information on the state of datasets as they evolve and change over time. Iceberg has become very popular for its support for ACID transactions in data lakes and features like schema and partition evolution, time travel, and rollback.

Data Lake

Data Lake Data Processing Metadata Snapshot

Data Lakes on Cloud & it’s Usage in Healthcare

BizAcuity

MARCH 29, 2019

Data lakes are centralized repositories that can store all structured and unstructured data at any desired scale. The power of the data lake lies in the fact that it often is a cost-effective way to store data. The power of the data lake lies in the fact that it often is a cost-effective way to store data.

Data Lake

Data Lake Unstructured Data Cost-Benefit Data Quality

Webinars

How to Optimize the Developer Experience for Monumental Impact

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

MORE WEBINARS

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

AWS Big Data

MARCH 7, 2024

At the same time, they need to optimize operational costs to unlock the value of this data for timely insights and do so with a consistent performance. With this massive data growth, data proliferation across your data stores, data warehouse, and data lakes can become equally challenging.

Data Lake

Data Lake Analytics Dashboards Metrics

Data Lakes: What Are They and Who Needs Them?

Jet Global

JULY 2, 2019

To address the flood of data and the needs of enterprise businesses to store, sort, and analyze that data, a new storage solution has evolved: the data lake. What’s in a Data Lake? Data warehouses do a great job of standardizing data from disparate sources for analysis. Taking a Dip.

Data Lake

Data Lake Data Warehouse Big Data Machine Learning

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics, Part 3: Visualization and trend analysis using Amazon QuickSight

AWS Big Data

MARCH 29, 2024

Grafana provides powerful customizable dashboards to view pipeline health. QuickSight makes it straightforward for business users to visualize data in interactive dashboards and reports. An AWS Glue crawler scans data on the S3 bucket and populates table metadata on the AWS Glue Data Catalog.

Metrics

Metrics Visualization Dashboards Interactive

How HR&A uses Amazon Redshift spatial analytics on Amazon Redshift Serverless to measure digital equity in states across the US

AWS Big Data

DECEMBER 5, 2023

HR&A has used Amazon Redshift Serverless and CARTO to process survey findings more efficiently and create custom interactive dashboards to facilitate understanding of the results. A combination of Amazon Redshift Spectrum and COPY commands are used to ingest the survey data stored as CSV files.

Measurement

Measurement Dashboards Data Warehouse Analytics

Case study: Policy Enforcement Automation With Semantics

Ontotext

MAY 2, 2024

Storage-centric approach In the storage-centric approach, people try to address data silos by throwing everything in a data lake or a data warehouse. But, although, this helps somewhat in terms of architecture, soon these data lakes become unwieldy.

Metadata

Metadata Data Lake Data-driven Enterprise

How gaming companies can use Amazon Redshift Serverless to build scalable analytical applications faster and easier

AWS Big Data

MARCH 7, 2023

A data hub contains data at multiple levels of granularity and is often not integrated. It differs from a data lake by offering data that is pre-validated and standardized, allowing for simpler consumption by users. Data hubs and data lakes can coexist in an organization, complementing each other.

Analytics

Analytics Data Warehouse Data Lake Metadata

How to use foundation models and trusted governance to manage AI workflow risk

IBM Big Data Hub

OCTOBER 16, 2023

It includes processes that trace and document the origin of data, models and associated metadata and pipelines for audits. How to scale AL and ML with built-in governance A fit-for-purpose data store built on an open lakehouse architecture allows you to scale AI and ML while providing built-in governance tools.

Risk

Risk Modeling Management Metadata

Addressing Data Mesh Technical Challenges with DataOps

DataKitchen

AUGUST 9, 2021

In essence, a domain is an integrated data set and a set of views, reports, dashboards, and artifacts created from the data. The domain also includes code that acts upon the data, including tools, pipelines, and other artifacts that drive analytics execution.

Testing

Testing Data Lake Metadata Publishing

The Madness of Data (and analytics) Governance

Andrew White

DECEMBER 9, 2019

The client had recently engaged with a well-known consulting company that had recommended a large data catalog effort to collect all enterprise metadata to help identify all data and business issues. Modern data (and analytics) governance does not necessarily need: Wall-to-wall discovery of your data and metadata.

Analytics

Analytics Data Lake Data Governance Metadata

Manage your data warehouse cost allocations with Amazon Redshift Serverless tagging

AWS Big Data

MARCH 27, 2023

Developers, data scientists, and analysts can work across databases, data warehouses, and data lakes to build reporting and dashboarding applications, perform real-time analytics, share and collaborate on data, and even build and train machine learning (ML) models with Redshift Serverless.

Data Warehouse

Data Warehouse Management Snapshot Data Lake

Five benefits of a data catalog

IBM Big Data Hub

DECEMBER 16, 2022

For example, data catalogs have evolved to deliver governance capabilities like managing data quality and data privacy and compliance. It uses metadata and data management tools to organize all data assets within your organization. Technical metadata to describe schemas, indexes and other database objects.

Metadata

Metadata Data Quality Data-driven Data Governance

Exploring real-time streaming for generative AI Applications

AWS Big Data

MARCH 25, 2024

Streaming jobs constantly ingest new data to synchronize across systems and can perform enrichment, transformations, joins, and aggregations across windows of time more efficiently. OpenSearch Service offers visualization capabilities powered by OpenSearch Dashboards and Kibana (1.5

Data Lake

Data Lake Unstructured Data Management Modeling

Architectural patterns for real-time analytics using Amazon Kinesis Data Streams, part 1

AWS Big Data

JANUARY 8, 2024

In this post, we will review the common architectural patterns of two use cases: Time Series Data Analysis and Event Driven Microservices. All these architecture patterns are integrated with Amazon Kinesis Data Streams. The raw data can be streamed to Amazon S3 for archiving.

Analytics

Analytics IoT Data-driven Snapshot

Convergent Evolution

Peter James Thomas

AUGUST 18, 2018

That was the Science, here comes the Technology… A Brief Hydrology of Data Lakes. Even back then, these were used for activities such as Analytics , Dashboards , Statistical Modelling , Data Mining and Advanced Visualisation. This required additional investments in metadata.

Data Lake

Data Lake Data Warehouse Data mining Statistics

How SumUp made digital analytics more accessible using AWS Glue

AWS Big Data

JUNE 6, 2023

Unless, of course, the rest of their data also resides in the Google Cloud. In this post we showcase how we used AWS Glue to move siloed digital analytics data, with inconsistent arrival times, to AWS S3 (our Data Lake) and our central data warehouse (DWH), Snowflake. It consists of full-day and intraday tables.

Analytics

Analytics Data Lake Testing Optimization

Power enterprise-grade Data Vaults with Amazon Redshift – Part 2

AWS Big Data

NOVEMBER 16, 2023

Amazon Redshift is a popular cloud data warehouse, offering a fully managed cloud-based service that seamlessly integrates with an organization’s Amazon Simple Storage Service (Amazon S3) data lake, real-time streams, machine learning (ML) workflows, transactional workflows, and much more—all while providing up to 7.9x

Enterprise

Enterprise Data Warehouse Snapshot Cost-Benefit

Governing data in relational databases using Amazon DataZone

AWS Big Data

MAY 7, 2024

It also makes it easier for engineers, data scientists, product managers, analysts, and business users to access data throughout an organization to discover, use, and collaborate to derive data-driven insights. Note that a managed data asset is an asset for which Amazon DataZone can manage permissions.

Metadata

Metadata Data Lake Data Processing Data-driven

What Is Alation Connected Sheets? Q&A with the Creators

Alation

NOVEMBER 28, 2022

But refreshing this analysis with the latest data was impossible… unless you were proficient in SQL or Python. We wanted to make it easy for anyone to pull data and self service without the technical know-how of the underlying database or data lake. They can understand the context of data.

Metadata

Metadata Enterprise Cost-Benefit Finance

How Ruparupa gained updated insights with an Amazon S3 data lake, AWS Glue, Apache Hudi, and Amazon QuickSight

AWS Big Data

FEBRUARY 22, 2023

In this post, we show how Ruparupa implemented an incrementally updated data lake to get insights into their business using Amazon Simple Storage Service (Amazon S3), AWS Glue , Apache Hudi , and Amazon QuickSight. An AWS Glue ETL job, using the Apache Hudi connector, updates the S3 data lake hourly with incremental data.

Data Lake

Data Lake Dashboards Cost-Benefit Metadata

Unlock data across organizational boundaries using Amazon DataZone – now generally available

AWS Big Data

OCTOBER 4, 2023

An Amazon DataZone domain contains an associated business data catalog for search and discovery, a set of metadata definitions to decorate the data assets that are used for discovery purposes, and data projects with integrated analytics and ML tools for users and groups to consume and publish data assets.

Metadata

Metadata Data Lake Publishing Data Governance

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

AWS Big Data

NOVEMBER 29, 2023

Zero-ETL integration also enables you to load and analyze data from multiple operational database clusters in a new or existing Amazon Redshift instance to derive holistic insights across many applications. Use one click to access your data lake tables using auto-mounted AWS Glue data catalogs on Amazon Redshift for a simplified experience.

Data Warehouse

Data Warehouse Data Lake Analytics Machine Learning

Federate Amazon QuickSight access with open-source identity provider Keycloak

AWS Big Data

JUNE 13, 2023

Sign in to your Keycloak admin dashboard. For the Keycloak admin dashboard, use [link]. Download the SAML metadata file. In the navigation pane under Clients , import the SAML metadata file. Download the Keycloak IdP SAML metadata file from that URL location. Vamsi Bhadriraju is a Data Architect at AWS.

Metadata

Metadata Dashboards Business Intelligence Management

A Day in the Life of a DataOps Engineer

DataKitchen

OCTOBER 11, 2021

She applies some calculations and forwards the file to a data engineer who loads the data into a database and runs a Talend job that performs ETL to dimensionalize the data and produce a Data Mart. The data engineer then emails the BI Team, who refreshes a Tableau dashboard. Monitoring Job Metadata.

Testing

Testing Metadata Dashboards Statistics

6 BI challenges IT teams must address

CIO Business Intelligence

DECEMBER 21, 2022

“The number-one issue for our BI team is convincing people that business intelligence will help to make true data-driven decisions,” says Diana Stout, senior business analyst at Schellman, a global cybersecurity assessor based in Tampa, Fl. But what they really need to do is fundamentally rethink how data is managed and accessed,” he says.

IT

IT Business Intelligence Sales Key Performance Indicator

Generate security insights from Amazon Security Lake data using Amazon OpenSearch Ingestion

AWS Big Data

AUGUST 28, 2023

OpenSearch Ingestion reads Parquet formatted security data from the Security Lake managed Amazon S3 bucket and transforms the security logs into JSON documents. OpenSearch Ingestion ingests this OCSF compliant data into OpenSearch Service. Optionally, specify the Amazon S3 storage class for the data in Amazon Security Lake.

Dashboards

Dashboards Visualization Metadata Management

Get Your Analytics Insights Instantly – Without Abandoning Central IT

Cloudera

JANUARY 21, 2021

With CDW, as an integrated service of CDP, your line of business gets immediate resources needed for faster application launches and expedited data access, all while protecting the company’s multi-year investment in centralized data management, security, and governance. Proprietary file formats mean no one else is invited in!

Data Lake

Data Lake Data Warehouse IT Analytics

Create an end-to-end data strategy for Customer 360 on AWS

AWS Big Data

MARCH 26, 2024

Profile aggregation – When you’ve uniquely identified a customer, you can build applications in Managed Service for Apache Flink to consolidate all their metadata, from name to interaction history. Then, you transform this data into a concise format. The following diagram shows a sample C360 dashboard built on Amazon QuickSight.

Data Strategy

Data Strategy Strategy Data Warehouse Prescriptive Analytics

Unlocking the value of data as your differentiator

AWS Big Data

NOVEMBER 29, 2023

You also need services to store data for analysis and machine learning (ML) like Amazon Simple Storage Service (Amazon S3). Customers have created hundreds of thousands of data lakes on Amazon S3. Amazon DataZone uses ML to automatically add metadata to your data catalog, making all of your data more discoverable.

Data Warehouse

Data Warehouse Data Lake Data Integration Dashboards

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

Cloudera

JUNE 30, 2022

With Cloudera’s vision of hybrid data , enterprises adopting an open data lakehouse can easily get application interoperability and portability to and from on premises environments and any public cloud without worrying about data scaling. Why integrate Apache Iceberg with Cloudera Data Platform?

Data Lake

Data Lake Data Architecture Metadata Data Warehouse

How Morningstar used tag-based access controls in AWS Lake Formation to manage permissions for an Amazon Redshift data warehouse

AWS Big Data

APRIL 6, 2023

In this post, Morningstar’s Data Lake Team Leads discuss how they utilized tag-based access control in their data lake with AWS Lake Formation and enabled similar controls in Amazon Redshift. This way, our existing data lake consumers could easily transition to Amazon Redshift.

Data Warehouse

Data Warehouse Data Lake Management Data-driven

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

AWS Big Data

MARCH 3, 2023

Building data lakes from continuously changing transactional data of databases and keeping data lakes up to date is a complex task and can be an operational challenge. You can then apply transformations and store data in Delta format for managing inserts, updates, and deletes.

Data Lake

Data Lake Dashboards Metrics Metadata

Power enterprise-grade Data Vaults with Amazon Redshift – Part 1

AWS Big Data

NOVEMBER 16, 2023

Amazon Redshift is a popular cloud data warehouse, offering a fully managed cloud-based service that seamlessly integrates with an organization’s Amazon Simple Storage Service (Amazon S3) data lake, real-time streams, machine learning (ML) workflows, transactional workflows, and much more—all while providing up to 7.9x

Enterprise

Enterprise Data Warehouse Data Lake Optimization

Ingest, transform, and deliver events published by Amazon Security Lake to Amazon OpenSearch Service

AWS Big Data

JUNE 19, 2023

Security Lake automatically centralizes security data from cloud, on-premises, and custom sources into a purpose-built data lake stored in your account. With Security Lake, you can get a more complete understanding of your security data across your entire organization. Choose Import.

Publishing

Publishing Dashboards Visualization Management

A hybrid approach in healthcare data warehousing with Amazon Redshift

AWS Big Data

FEBRUARY 21, 2023

At the heart of all data warehousing is integration, and this layer contains integrated data from multiple sources built around the enterprise-wide business keys. Although data lakes resemble data vaults, a data vault provides more features of a data warehouse.

Data Warehouse

Data Warehouse Data Lake Cost-Benefit Modeling

Introducing watsonx: The future of AI for business

IBM Big Data Hub

MAY 9, 2023

Optimized for all data, analytics and AI workloads, watsonx.data combines the flexibility of a data lake with the performance of a data warehouse, helping businesses to scale data analytics and AI anywhere their data resides. Put AI to work in your business with IBM today IBM is infusing watsonx.ai

Data Warehouse

Data Warehouse Cost-Benefit Machine Learning Modeling

The Audience for Data Catalogs and Data Intelligence

Alation

JUNE 21, 2022

Analysts didn’t just want to catalog data sources, they wanted to include dashboards, reports, and visualizations. Why start with a data source and build a visualization, if you can just find a visualization that already exists, complete with metadata about it? Platform means capable of supporting multiple audiences (e.g.,

Metadata

Metadata Data Quality Visualization Data Lake

Analyze Amazon S3 storage costs using AWS Cost and Usage Reports, Amazon S3 Inventory, and Amazon Athena

AWS Big Data

FEBRUARY 2, 2023

Since its launch in 2006, Amazon Simple Storage Service (Amazon S3) has experienced major growth, supporting multiple use cases such as hosting websites, creating data lakes, serving as object storage for consumer applications, storing logs, and archiving data. This could be your data lake or application S3 bucket.

Reporting

Reporting Data Lake Management Optimization

A Guide to Data Analytics in the Travel Industry

Alation

MARCH 21, 2023

When it embarked on a digital transformation and modernization initiative in 2018, the company migrated all its data to AWS S3 Data Lake and Snowflake Data Cloud to provide accessibility to data to all users. Using Alation, ARC automated the data curation and cataloging process. “So

Data Analytics

Data Analytics Analytics Data-driven Big Data

What Do You Actually Need from a Data Catalog Tool?

Alation

SEPTEMBER 23, 2021

What Is a Data Catalog? A data catalog is a centralized storage bank of metadata on information sources from across the enterprise, such as: Datasets. The data catalog also stores metadata (data about data, like a conversation), which gives users context on how to use each asset. Visualizations.

Enterprise

Enterprise Metadata Data Governance Visualization

AWS re:Invent 2023 Amazon Redshift Sessions Recap

AWS Big Data

DECEMBER 18, 2023

Amazon Redshift powers data-driven decisions for tens of thousands of customers every day with a fully managed, AI-powered cloud data warehouse, delivering the best price-performance for your analytics workloads.

Data Warehouse

Data Warehouse Machine Learning Data-driven Data Lake

Turning Streams Into Data Products

Cloudera

JUNE 16, 2022

CSP was recently recognized as a leader in the 2022 GigaOm Radar for Streaming Data Platforms report. The DevOps/app dev team wants to know how data flows between such entities and understand the key performance metrics (KPMs) of these entities. Without context, streaming data is useless.”

Data Lake

Data Lake Manufacturing Metadata Dashboards

Migrate an existing data lake to a transactional data lake using Apache Iceberg

Use Apache Iceberg in a data lake to support incremental data processing

Webinars

Trending Sources

Data Lakes on Cloud & it’s Usage in Healthcare

Webinars

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

Data Lakes: What Are They and Who Needs Them?

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics, Part 3: Visualization and trend analysis using Amazon QuickSight

How HR&A uses Amazon Redshift spatial analytics on Amazon Redshift Serverless to measure digital equity in states across the US

Case study: Policy Enforcement Automation With Semantics

How gaming companies can use Amazon Redshift Serverless to build scalable analytical applications faster and easier

How to use foundation models and trusted governance to manage AI workflow risk

Addressing Data Mesh Technical Challenges with DataOps

The Madness of Data (and analytics) Governance

Manage your data warehouse cost allocations with Amazon Redshift Serverless tagging

Five benefits of a data catalog

Exploring real-time streaming for generative AI Applications

Architectural patterns for real-time analytics using Amazon Kinesis Data Streams, part 1

Convergent Evolution

How SumUp made digital analytics more accessible using AWS Glue

Power enterprise-grade Data Vaults with Amazon Redshift – Part 2

Governing data in relational databases using Amazon DataZone

What Is Alation Connected Sheets? Q&A with the Creators

How Ruparupa gained updated insights with an Amazon S3 data lake, AWS Glue, Apache Hudi, and Amazon QuickSight

Unlock data across organizational boundaries using Amazon DataZone – now generally available

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

Federate Amazon QuickSight access with open-source identity provider Keycloak

A Day in the Life of a DataOps Engineer

6 BI challenges IT teams must address

Generate security insights from Amazon Security Lake data using Amazon OpenSearch Ingestion

Get Your Analytics Insights Instantly – Without Abandoning Central IT

Create an end-to-end data strategy for Customer 360 on AWS

Unlocking the value of data as your differentiator

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

How Morningstar used tag-based access controls in AWS Lake Formation to manage permissions for an Amazon Redshift data warehouse

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

Power enterprise-grade Data Vaults with Amazon Redshift – Part 1

Ingest, transform, and deliver events published by Amazon Security Lake to Amazon OpenSearch Service

A hybrid approach in healthcare data warehousing with Amazon Redshift

Introducing watsonx: The future of AI for business

The Audience for Data Catalogs and Data Intelligence

Analyze Amazon S3 storage costs using AWS Cost and Usage Reports, Amazon S3 Inventory, and Amazon Athena

A Guide to Data Analytics in the Travel Industry

What Do You Actually Need from a Data Catalog Tool?

AWS re:Invent 2023 Amazon Redshift Sessions Recap

Turning Streams Into Data Products

Stay Connected