Data Lake, Data Processing, Metadata and Reporting

Data Lake

Data Processing

Metadata

Reporting

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

AWS Big Data

MARCH 10, 2023

Since the deluge of big data over a decade ago, many organizations have learned to build applications to process and analyze petabytes of data. Data lakes have served as a central repository to store structured and unstructured data at any scale and in various formats.

Data Lake

Data Lake Sales Data Warehouse Snapshot

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. In this post, we describe Orca’s journey building a transactional data lake using Amazon Simple Storage Service (Amazon S3), Apache Iceberg, and AWS Analytics.

Data Lake

Data Lake Analytics Snapshot Optimization

Join 52,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Leading the Development of Profitable and Sustainable Products

How To Get Promoted In Product Management

MORE WEBINARS

Trending Sources

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

AWS Big Data

DECEMBER 13, 2023

For the past 5 years, BMS has used a custom framework called Enterprise Data Lake Services (EDLS) to create ETL jobs for business users. BMS’s EDLS platform hosts over 5,000 jobs and is growing at 15% YoY (year over year). It retrieves the specified files and available metadata to show on the UI.

Metadata

Metadata Data Lake Visualization Data Transformation

Webinars

Leading the Development of Profitable and Sustainable Products

How To Get Promoted In Product Management

MORE WEBINARS

Analyze Amazon S3 storage costs using AWS Cost and Usage Reports, Amazon S3 Inventory, and Amazon Athena

AWS Big Data

FEBRUARY 2, 2023

Since its launch in 2006, Amazon Simple Storage Service (Amazon S3) has experienced major growth, supporting multiple use cases such as hosting websites, creating data lakes, serving as object storage for consumer applications, storing logs, and archiving data. Enable the Cost and Usage Reports.

Reporting

Reporting Data Lake Management Optimization

Habib Bank manages data at scale with Cloudera Data Platform

Cloudera

NOVEMBER 17, 2022

The Solution: CDP Private Cloud brings a next-generation hybrid architecture with cloud-native benefits to HBL’s data platform. HBL started their data journey in 2019 when data lake initiative was started to consolidate complex data sources and enable the bank to use single version of truth for decision making.

Management

Management Data Lake Consulting Unstructured Data

How smava makes loans transparent and affordable using Amazon Redshift Serverless

AWS Big Data

DECEMBER 21, 2023

The data products used inside the company include insights from user journeys, operational reports, and marketing campaign results, among others. The data platform serves on average 60 thousand queries per day. The data volume is in double-digit TBs with steady growth as business and data sources evolve.

Data Lake

Data Lake Data Warehouse Data-driven B2B

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics, Part 3: Visualization and trend analysis using Amazon QuickSight

AWS Big Data

MARCH 29, 2024

QuickSight makes it straightforward for business users to visualize data in interactive dashboards and reports. You can slice data by different dimensions like job name, see anomalies, and share reports securely across your organization. QuickSight lets you perform aggregate calculations on metrics for deeper analysis.

Metrics

Metrics Visualization Dashboards Interactive

Build efficient ETL pipelines with AWS Step Functions distributed map and redrive feature

AWS Big Data

DECEMBER 18, 2023

Solution overview One of the common functionalities involved in data pipelines is extracting data from multiple data sources and exporting it to a data lake or synchronizing the data to another database. There are multiple tables related to customers and order data in the RDS database.

Metadata

Metadata Visualization Data Lake Data-driven

Create an end-to-end data strategy for Customer 360 on AWS

AWS Big Data

MARCH 26, 2024

Without C360, businesses face missed opportunities, inaccurate reports, and disjointed customer experiences, leading to customer churn. Profile aggregation – When you’ve uniquely identified a customer, you can build applications in Managed Service for Apache Flink to consolidate all their metadata, from name to interaction history.

Data Strategy

Data Strategy Strategy Data Warehouse Prescriptive Analytics

How Amazon Finance Automation built a data mesh to support distributed data ownership and centralize governance

AWS Big Data

JULY 14, 2023

The FinAuto team built AWS Cloud Development Kit (AWS CDK), AWS CloudFormation , and API tools to maintain a metadata store that ingests from domain owner catalogs into the global catalog. This global catalog captures new or updated partitions from the data producer AWS Glue Data Catalogs.

Finance

Finance Metadata Big Data Recreation/Entertainment

Get Your Analytics Insights Instantly – Without Abandoning Central IT

Cloudera

JANUARY 21, 2021

Does your organization’s success depend on immediate delivery of new reports, applications, or projects? Cloudera’s Data Warehouse service allows raw data to be stored in the cloud storage of your choice (S3, ADLSg2). When your IT admin registers an environment in CDP, a Data Lake is automatically deployed.

Data Lake

Data Lake Data Warehouse IT Analytics

Federate Amazon QuickSight access with open-source identity provider Keycloak

AWS Big Data

JUNE 13, 2023

Download the SAML metadata file. In the navigation pane under Clients , import the SAML metadata file. Insert your specific host domain name where the Keycloak application resides in the following URL: [link] /realms/aws-realm/protocol/saml/descriptor. Download the Keycloak IdP SAML metadata file from that URL location.

Metadata

Metadata Dashboards Business Intelligence Management

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Cloudera

OCTOBER 7, 2021

Those decentralization efforts appeared under different monikers through time, e.g., data marts versus data warehousing implementations (a popular architectural debate in the era of structured data) then enterprise-wide data lakes versus smaller, typically BU-Specific, “data ponds”.

Metadata

Metadata Cost-Benefit Enterprise Interactive

What is Data Mapping?

Jet Global

FEBRUARY 23, 2024

Data sources are crucial for reporting, analyzing, and acting on transactional and corporate data and connecting these sources in real time with various tools like connectors, ETL tools, mashups, Web services, and data source-neutral BI solutions is essential.

Data Warehouse

Data Warehouse Reporting Data Transformation Sales

Announcing the 2021 Data Impact Awards

Cloudera

MAY 12, 2021

2020 saw us hosting our first ever fully digital Data Impact Awards ceremony, and it certainly was one of the highlights of our year. We saw a record number of entries and incredible examples of how customers were using Cloudera’s platform and services to unlock the power of data. DATA FOR GOOD.

Digital Transformation

Digital Transformation Machine Learning Optimization Data Lake

Extreme data center pressure? Burst to the cloud with CDP!

Cloudera

NOVEMBER 12, 2020

At these times, they run business growth reports, shareholder reports, and financial reports for their earnings calls, to name a few examples. For example, the bank from our example might have separate destination data lakes for their perpetual and periodic workloads to support addressing these VIP workloads separately.

Data Warehouse

Data Warehouse Reporting Risk Cost-Benefit

CIOs rise to the ESG reporting challenge

CIO Business Intelligence

JANUARY 30, 2024

Corporate ESG reporting is getting real for companies around the globe. Enacted and proposed regulations in the EU, US, and beyond are deepening reporting requirements in an effort to change business behavior. The foundation for ESG reporting, of course, is data. The foundation for ESG reporting, of course, is data.

Reporting

Reporting Data Quality Strategy Data-driven

Themes and Conferences per Pacoid, Episode 8

Domino Data Lab

APRIL 3, 2019

Data governance shows up as the fourth-most-popular kind of solution that enterprise teams were adopting or evaluating during 2019. That’s a lot of priorities – especially when you group together closely related items such as data lineage and metadata management which rank nearby. in lieu of simply landing in a data lake.

Data Governance

Data Governance Machine Learning Metadata Big Data

Dancing with Elephants in 5 Easy Steps

Cloudera

AUGUST 21, 2020

The Corner Office is pressing their direct reports across the company to “Move To The Cloud” to increase agility and reduce costs. There are now tens of thousands of instances of these Big Data platforms running in production around the world today, and the number is increasing every year. OpEx savings and probable ROI once migrated.

Cost-Benefit

Cost-Benefit Big Data ROI Risk

The Gartner 2021 Leadership Vision for Data & Analytics Leaders Webinar Q&A

Andrew White

JANUARY 11, 2021

On January 4th I had the pleasure of hosting a webinar. It was titled, The Gartner 2021 Leadership Vision for Data & Analytics Leaders. This was for the Chief Data Officer, or head of data and analytics. The fill report is here: Leadership Vision for 2021: Data and Analytics. That’s the idea.

Data Analytics

Data Analytics Analytics Data-driven Finance

Top 15 data management platforms available today

CIO Business Intelligence

SEPTEMBER 22, 2023

Data management platform definition A data management platform (DMP) is a suite of tools that helps organizations to collect and manage data from a wide array of first-, second-, and third-party sources and to create reports and build customer profiles as part of targeted personalization campaigns.

Management

Management Advertising Data Lake Sales

Top 15 data management platforms

CIO Business Intelligence

JUNE 9, 2022

A data management platform (DMP) is a group of tools designed to help organizations collect and manage data from a wide array of sources and to create reports that help explain what is happening in those data streams. Deploying a DMP can be a great way for companies to navigate a business world dominated by data.

Management

Management Advertising Data Lake Sales

Data Governance for Dummies: Your Questions, Answered

Alation

FEBRUARY 17, 2023

This past week, I had the pleasure of hosting Data Governance for Dummies author Jonathan Reichental for a fireside chat , along with Denise Swanson , Data Governance lead at Alation. This structure sorts data by category, defining differences, allowing for overlaps, and providing flexibility in discovery.

Data Governance

Data Governance Data Quality Metadata Cost-Benefit

Data Leaders Brief

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Webinars

Trending Sources

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

Webinars

Analyze Amazon S3 storage costs using AWS Cost and Usage Reports, Amazon S3 Inventory, and Amazon Athena

Habib Bank manages data at scale with Cloudera Data Platform

How smava makes loans transparent and affordable using Amazon Redshift Serverless

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics, Part 3: Visualization and trend analysis using Amazon QuickSight

Build efficient ETL pipelines with AWS Step Functions distributed map and redrive feature

Create an end-to-end data strategy for Customer 360 on AWS

How Amazon Finance Automation built a data mesh to support distributed data ownership and centralize governance

Get Your Analytics Insights Instantly – Without Abandoning Central IT

Federate Amazon QuickSight access with open-source identity provider Keycloak

How Cloudera Data Flow Enables Successful Data Mesh Architectures

What is Data Mapping?

Announcing the 2021 Data Impact Awards

Extreme data center pressure? Burst to the cloud with CDP!

CIOs rise to the ESG reporting challenge

Themes and Conferences per Pacoid, Episode 8

Dancing with Elephants in 5 Easy Steps

The Gartner 2021 Leadership Vision for Data & Analytics Leaders Webinar Q&A

Top 15 data management platforms available today

Top 15 data management platforms

Data Governance for Dummies: Your Questions, Answered

Stay Connected