Data Analytics, Data Lake, Metadata and Reporting

Data Analytics

Data Lake

Metadata

Reporting

Multicloud data lake analytics with Amazon Athena

AWS Big Data

MARCH 18, 2024

Many organizations operate data lakes spanning multiple cloud data stores. In these cases, you may want an integrated query layer to seamlessly run analytical queries across these diverse cloud stores and streamline your data analytics processes. This serves as the S3 data lake data for this post.

Data Lake

Data Lake Analytics Cost-Benefit Management

Choosing an open table format for your transactional data lake on AWS

AWS Big Data

JUNE 9, 2023

A modern data architecture enables companies to ingest virtually any type of data through automated pipelines into a data lake, which provides highly durable and cost-effective object storage at petabyte or exabyte scale.

Data Lake

Data Lake Metadata Optimization Statistics

Join 52,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

How To Get Promoted In Product Management

MORE WEBINARS

Trending Sources

Analytics Vidhya

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. In this post, we describe Orca’s journey building a transactional data lake using Amazon Simple Storage Service (Amazon S3), Apache Iceberg, and AWS Analytics.

Data Lake

Data Lake Analytics Snapshot Optimization

Webinars

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

How To Get Promoted In Product Management

MORE WEBINARS

Empower your Jira data in a data lake with Amazon AppFlow and AWS Glue

AWS Big Data

AUGUST 1, 2023

Although Jira Cloud provides reporting capability, loading this data into a data lake will facilitate enrichment with other business data, as well as support the use of business intelligence (BI) tools and artificial intelligence (AI) and machine learning (ML) applications. For InitialRunFlag , choose Setup.

Data Lake

Data Lake Data Transformation Cost-Benefit Data-driven

How Ruparupa gained updated insights with an Amazon S3 data lake, AWS Glue, Apache Hudi, and Amazon QuickSight

AWS Big Data

FEBRUARY 22, 2023

In this post, we show how Ruparupa implemented an incrementally updated data lake to get insights into their business using Amazon Simple Storage Service (Amazon S3), AWS Glue , Apache Hudi , and Amazon QuickSight. An AWS Glue ETL job, using the Apache Hudi connector, updates the S3 data lake hourly with incremental data.

Data Lake

Data Lake Dashboards Cost-Benefit Metadata

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

AWS Big Data

DECEMBER 13, 2023

For the past 5 years, BMS has used a custom framework called Enterprise Data Lake Services (EDLS) to create ETL jobs for business users. EDLS job steps and metadata Every EDLS job comprises one or more job steps chained together and run in a predefined order orchestrated by the custom ETL framework.

Metadata

Metadata Data Lake Visualization Data Transformation

Achieve your AI goals with an open data lakehouse approach

IBM Big Data Hub

OCTOBER 4, 2023

Another IDC study showed that while 2/3 of respondents reported using AI-driven data analytics, most reported that less than half of the data under management is available for this type of analytics. from 2022 to 2026. New insights and relationships are found in this combination.

Data Lake

Data Lake Metadata Cost-Benefit Data Warehouse

Lay the groundwork now for advanced analytics and AI

CIO Business Intelligence

AUGUST 3, 2023

When global technology company Lenovo started utilizing data analytics, they helped identify a new market niche for its gaming laptops, and powered remote diagnostics so their customers got the most from their servers and other devices. Without those templates, it’s hard to add such information after the fact.”

Analytics

Analytics Data Lake Metadata Cost-Benefit

Building a Beautiful Data Lakehouse

CIO Business Intelligence

MARCH 9, 2022

Applying artificial intelligence (AI) to data analytics for deeper, better insights and automation is a growing enterprise IT priority. But the data repository options that have been around for a while tend to fall short in their ability to serve as the foundation for big data analytics powered by AI.

Data Lake

Data Lake Unstructured Data Data Warehouse Data Quality

What is a Data Mesh?

DataKitchen

AUGUST 3, 2021

First-generation – expensive, proprietary enterprise data warehouse and business intelligence platforms maintained by a specialized team drowning in technical debt. Second-generation – gigantic, complex data lake maintained by a specialized team drowning in technical debt. Easy to report problems and receive updates on fixes.

Data Architecture

Data Architecture Data Lake Cost-Benefit Data Warehouse

A Guide to Data Analytics in the Travel Industry

Alation

MARCH 21, 2023

Why is data analytics important for travel organizations? With data analytics , travel organizations can gain real-time insights about customers to make strategic decisions and improve their travel experience. How is data analytics used in the travel industry?

Data Analytics

Data Analytics Analytics Data-driven Big Data

Build a multi-Region and highly resilient modern data architecture using AWS Glue and AWS Lake Formation

AWS Big Data

JANUARY 24, 2023

This solution only replicates metadata in the Data Catalog, not the actual underlying data. To have a redundant data lake using Lake Formation and AWS Glue in an additional Region, we recommend replicating the Amazon S3-based storage using S3 replication , S3 sync, aws-s3-copy-sync-using-batch or S3 Batch replication process.

Data Architecture

Data Architecture Metadata Data Lake Snapshot

How smava makes loans transparent and affordable using Amazon Redshift Serverless

AWS Big Data

DECEMBER 21, 2023

The data products used inside the company include insights from user journeys, operational reports, and marketing campaign results, among others. The data platform serves on average 60 thousand queries per day. The data volume is in double-digit TBs with steady growth as business and data sources evolve.

Data Lake

Data Lake Data Warehouse Data-driven B2B

Habib Bank manages data at scale with Cloudera Data Platform

Cloudera

NOVEMBER 17, 2022

The Solution: CDP Private Cloud brings a next-generation hybrid architecture with cloud-native benefits to HBL’s data platform. HBL started their data journey in 2019 when data lake initiative was started to consolidate complex data sources and enable the bank to use single version of truth for decision making.

Management

Management Data Lake Consulting Unstructured Data

A Day in the Life of a DataOps Engineer

DataKitchen

OCTOBER 11, 2021

The biggest challenge is broken data pipelines due to highly manual processes. Figure 1 shows a manually executed data analytics pipeline. First, a business analyst consolidates data from some public websites, an SFTP server and some downloaded email attachments, all into Excel. Monitoring Job Metadata.

Testing

Testing Metadata Dashboards Statistics

Create an end-to-end data strategy for Customer 360 on AWS

AWS Big Data

MARCH 26, 2024

Without C360, businesses face missed opportunities, inaccurate reports, and disjointed customer experiences, leading to customer churn. Profile aggregation – When you’ve uniquely identified a customer, you can build applications in Managed Service for Apache Flink to consolidate all their metadata, from name to interaction history.

Data Strategy

Data Strategy Strategy Data Warehouse Prescriptive Analytics

Addressing Data Mesh Technical Challenges with DataOps

DataKitchen

AUGUST 9, 2021

In essence, a domain is an integrated data set and a set of views, reports, dashboards, and artifacts created from the data. The domain also includes code that acts upon the data, including tools, pipelines, and other artifacts that drive analytics execution.

Testing

Testing Data Lake Metadata Publishing

Unlock data across organizational boundaries using Amazon DataZone – now generally available

AWS Big Data

OCTOBER 4, 2023

An Amazon DataZone domain contains an associated business data catalog for search and discovery, a set of metadata definitions to decorate the data assets that are used for discovery purposes, and data projects with integrated analytics and ML tools for users and groups to consume and publish data assets.

Metadata

Metadata Data Lake Publishing Data Governance

Build efficient ETL pipelines with AWS Step Functions distributed map and redrive feature

AWS Big Data

DECEMBER 18, 2023

Solution overview One of the common functionalities involved in data pipelines is extracting data from multiple data sources and exporting it to a data lake or synchronizing the data to another database. There are multiple tables related to customers and order data in the RDS database.

Metadata

Metadata Visualization Data Lake Data-driven

AWS re:Invent 2023 Amazon Redshift Sessions Recap

AWS Big Data

DECEMBER 18, 2023

Sessions ANT203 | What’s new in Amazon Redshift Watch this session to learn about the newest innovations within Amazon Redshift—the petabyte-scale AWS Cloud data warehousing solution. Easily build and train machine learning models using SQL within Amazon Redshift to generate predictive analytics and propel data-driven decision-making.

Data Warehouse

Data Warehouse Machine Learning Data-driven Data Lake

How Amazon Finance Automation built a data mesh to support distributed data ownership and centralize governance

AWS Big Data

JULY 14, 2023

The FinAuto team built AWS Cloud Development Kit (AWS CDK), AWS CloudFormation , and API tools to maintain a metadata store that ingests from domain owner catalogs into the global catalog. This global catalog captures new or updated partitions from the data producer AWS Glue Data Catalogs.

Finance

Finance Metadata Big Data Recreation/Entertainment

Cloudera Named a Leader in the 2022 Gartner® Magic Quadrant™ for Cloud Database Management Systems (DBMS)

Cloudera

DECEMBER 16, 2022

Cloudera has been recognized in this cloud DBMS report since its inception in 2020. We’re proud to be recognized for the data management and data analytics innovations we have delivered in the new Cloudera Data Platform (CDP). Cloudera has long had the capabilities of a data lakehouse, if not the label.

Management

Management Metadata Machine Learning Data Lake

A hybrid approach in healthcare data warehousing with Amazon Redshift

AWS Big Data

FEBRUARY 21, 2023

The data vault approach solves most of the problems associated with dimensional models, but it brings other challenges in clinical quality control applications and regulatory reports. This is one of the biggest hurdles with the data vault approach. It optimizes the database for faster data retrieval. What is a data vault?

Data Warehouse

Data Warehouse Data Lake Cost-Benefit Modeling

Data democratization: How data architecture can drive business decisions and AI initiatives

IBM Big Data Hub

AUGUST 4, 2023

By leveraging data services and APIs, a data fabric can also pull together data from legacy systems, data lakes, data warehouses and SQL databases, providing a holistic view into business performance. It uses knowledge graphs, semantics and AI/ML technology to discover patterns in various types of metadata.

Data Architecture

Data Architecture Data Lake Machine Learning Data Governance

How data stores and governance impact your AI initiatives

IBM Big Data Hub

OCTOBER 12, 2023

Among the tasks necessary for internal and external compliance is the ability to report on the metadata of an AI model. Metadata includes details specific to an AI model such as: The AI model’s creation (when it was created, who created it, etc.)

Cost-Benefit

Cost-Benefit Metadata Data Governance Modeling

Data architecture strategy for data quality

IBM Big Data Hub

JANUARY 5, 2023

The first generation of data architectures represented by enterprise data warehouse and business intelligence platforms were characterized by thousands of ETL jobs, tables, and reports that only a small group of specialized data engineers understood, resulting in an under-realized positive impact on the business.

Data Quality

Data Quality Data Architecture Strategy Data Lake

Why We Started the Data Intelligence Project

Alation

JULY 7, 2022

In 2013 I joined American Family Insurance as a metadata analyst. I had always been fascinated by how people find, organize, and access information, so a metadata management role after school was a natural choice. The use cases for metadata are boundless, offering opportunities for innovation in every sector.

Metadata

Metadata Data-driven Insurance Statistics

Five benefits of a data catalog

IBM Big Data Hub

DECEMBER 16, 2022

For example, data catalogs have evolved to deliver governance capabilities like managing data quality and data privacy and compliance. It uses metadata and data management tools to organize all data assets within your organization. Technical metadata to describe schemas, indexes and other database objects.

Metadata

Metadata Data Quality Data-driven Data Governance

The Madness of Data (and analytics) Governance

Andrew White

DECEMBER 9, 2019

The outline of the call went as follows: I was taking to a central state agency who was organizing a data governance initiative (in their words) across three other state agencies. All four agencies had reported an independent but identical experience with data governance in the past. An enterprise-wide data governance board.

Analytics

Analytics Data Lake Data Governance Metadata

Turning Streams Into Data Products

Cloudera

JUNE 16, 2022

CSP was recently recognized as a leader in the 2022 GigaOm Radar for Streaming Data Platforms report. The DevOps/app dev team wants to know how data flows between such entities and understand the key performance metrics (KPMs) of these entities. Without context, streaming data is useless.”

Data Lake

Data Lake Manufacturing Metadata Dashboards

Overcome these six data consumption challenges for a more data-driven enterprise

IBM Big Data Hub

JUNE 8, 2022

A typical organization’s data landscape consists of a large number of data stores across workflows, business processes and business units, including but not limited to data warehouses, data marts, data lakes, ODS, cloud data stores, and CRM databases. The volume of data assets.

Data-driven

Data-driven Enterprise Data Governance Data Lake

Introducing watsonx: The future of AI for business

IBM Big Data Hub

MAY 9, 2023

A data store built on open lakehouse architecture, it runs both on premises and across multi-cloud environments. Optimized for all data, analytics and AI workloads, watsonx.data combines the flexibility of a data lake with the performance of a data warehouse, helping businesses to scale data analytics and AI anywhere their data resides.

Data Warehouse

Data Warehouse Machine Learning Cost-Benefit Metadata

Tackling AI’s data challenges with IBM databases on AWS

IBM Big Data Hub

MARCH 14, 2024

The solution: IBM databases on AWS To solve for these challenges, IBM’s portfolio of SaaS database solutions on Amazon Web Services (AWS), enables enterprises to scale applications, analytics and AI across the hybrid cloud landscape. With Netezza support for 1.2

Cost-Benefit

Cost-Benefit Metadata Optimization Management

How SumUp made digital analytics more accessible using AWS Glue

AWS Big Data

JUNE 6, 2023

Unless, of course, the rest of their data also resides in the Google Cloud. In this post we showcase how we used AWS Glue to move siloed digital analytics data, with inconsistent arrival times, to AWS S3 (our Data Lake) and our central data warehouse (DWH), Snowflake.

Analytics

Analytics Data Lake Testing Optimization

The Gartner 2021 Leadership Vision for Data & Analytics Leaders Webinar Q&A

Andrew White

JANUARY 11, 2021

What is unique about the D&A Leadership Vision is that it crossed over into business since for many organizations, the CDO reports into the CEO or COO (as examples). The fill report is here: Leadership Vision for 2021: Data and Analytics. CAO, and even where the CAO reports into a different organization.

Data Analytics

Data Analytics Analytics Data-driven Finance

Announcing the 2021 Data Impact Awards

Cloudera

MAY 12, 2021

In light of a year of unprecedented disruptions, where data has never been so important, and to reflect on the rapidly advancing world of data-led digital transformation, we are excited to announce this year’s 7 categories: DATA LIFECYCLE CONNECTION. DATA FOR GOOD. SECURITY AND GOVERNANCE LEADERSHIP.

Digital Transformation

Digital Transformation Machine Learning Optimization Data Lake

The Very Group adopts a data catalog to better organize and leverage its online retail capabilities

CIO Business Intelligence

SEPTEMBER 6, 2022

When Steve Pimblett joined The Very Group in October 2020 as chief data officer, reporting to the conglomerate’s CIO, his task was to help the enterprise uncover value in its rich data heritage. Establishing a clear and unified approach to data. I run the infrastructure and a central enterprise BI team.”.

IT Forecasting Data Lake Enterprise

What is going on in the world of data and analytics?

Andrew White

MARCH 22, 2019

to weave together the governance and management of master data, application data, and less-widely shared data, and just enough enterprise metadata management. Your Future Requires You to Define Your Real Master Data. This ties into the failure of data governance and MDM (see first item in this list).

Analytics

Analytics Metadata Data Governance Management

Dancing with Elephants in 5 Easy Steps

Cloudera

AUGUST 21, 2020

The Corner Office is pressing their direct reports across the company to “Move To The Cloud” to increase agility and reduce costs. Over time, additional use cases and functions expanded from original EDW and Data Lake related functions to support increasing demands from the business. Streaming data analytics. .

Cost-Benefit

Cost-Benefit Big Data ROI Risk

How to Build a Customer Centric Business: The Complete Guide

Alation

AUGUST 2, 2022

Customer centricity requires modernized data and IT infrastructures. Too often, companies manage data in spreadsheets or individual databases. This means that you’re likely missing valuable insights that could be gleaned from data lakes and data analytics. Customer Data Privacy And Security.

Strategy

Strategy Cost-Benefit Metrics Data Lake

Themes and Conferences per Pacoid, Episode 12

Domino Data Lab

AUGUST 8, 2019

I mention this here because there was a lot of overlap between current industry data governance needs and what the scientific community is working toward for scholarly infrastructure. The gist is, leveraging metadata about research datasets, projects, publications, etc., Data governance, for the win! Nothing Spreads Like Fear”.

Data Science

Data Science Machine Learning Data Governance Statistics

Multicloud data lake analytics with Amazon Athena

Choosing an open table format for your transactional data lake on AWS

Webinars

Trending Sources

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Webinars

Empower your Jira data in a data lake with Amazon AppFlow and AWS Glue

How Ruparupa gained updated insights with an Amazon S3 data lake, AWS Glue, Apache Hudi, and Amazon QuickSight

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

Achieve your AI goals with an open data lakehouse approach

Lay the groundwork now for advanced analytics and AI

Building a Beautiful Data Lakehouse

What is a Data Mesh?

A Guide to Data Analytics in the Travel Industry

Build a multi-Region and highly resilient modern data architecture using AWS Glue and AWS Lake Formation

How smava makes loans transparent and affordable using Amazon Redshift Serverless

Habib Bank manages data at scale with Cloudera Data Platform

A Day in the Life of a DataOps Engineer

Create an end-to-end data strategy for Customer 360 on AWS

Addressing Data Mesh Technical Challenges with DataOps

Unlock data across organizational boundaries using Amazon DataZone – now generally available

Build efficient ETL pipelines with AWS Step Functions distributed map and redrive feature

AWS re:Invent 2023 Amazon Redshift Sessions Recap

How Amazon Finance Automation built a data mesh to support distributed data ownership and centralize governance

Cloudera Named a Leader in the 2022 Gartner® Magic Quadrant™ for Cloud Database Management Systems (DBMS)

A hybrid approach in healthcare data warehousing with Amazon Redshift

Data democratization: How data architecture can drive business decisions and AI initiatives

How data stores and governance impact your AI initiatives

Data architecture strategy for data quality

Why We Started the Data Intelligence Project

Five benefits of a data catalog

The Madness of Data (and analytics) Governance

Turning Streams Into Data Products

Overcome these six data consumption challenges for a more data-driven enterprise

Introducing watsonx: The future of AI for business

Tackling AI’s data challenges with IBM databases on AWS

How SumUp made digital analytics more accessible using AWS Glue

The Gartner 2021 Leadership Vision for Data & Analytics Leaders Webinar Q&A

Announcing the 2021 Data Impact Awards

The Very Group adopts a data catalog to better organize and leverage its online retail capabilities

What is going on in the world of data and analytics?

Dancing with Elephants in 5 Easy Steps

How to Build a Customer Centric Business: The Complete Guide

Themes and Conferences per Pacoid, Episode 12

Stay Connected