Analytics, Data Analytics, Data Architecture and Metadata

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JUNE 10, 2024

In this blog post, we dive into different data aspects and how Cloudinary breaks the two concerns of vendor locking and cost efficient data analytics by using Apache Iceberg, Amazon Simple Storage Service (Amazon S3 ), Amazon Athena , Amazon EMR , and AWS Glue. This concept makes Iceberg extremely versatile.

Data Lake

Data Lake Metadata Snapshot Analytics

Data architecture strategy for data quality

IBM Big Data Hub

JANUARY 5, 2023

Several factors determine the quality of your enterprise data like accuracy, completeness, consistency, to name a few. But there’s another factor of data quality that doesn’t get the recognition it deserves: your data architecture. How the right data architecture improves data quality.

Data Quality

Data Quality Data Architecture Strategy Data Lake

Insights from Gartner Data & Analytics Summit Orlando 2023

Alation

MARCH 31, 2023

I’m talking about not just Walt Disney World, but also this year’s Gartner Data & Analytics Summit , which took place last month in Orlando at the landmark resort. Alation was proud to have been among the thought leaders at the annual gathering of data experts from around the world. It’s the place where dreams come true.

Data Analytics

Data Analytics Analytics Metadata Data Governance

Webinars

Leading the Development of Profitable and Sustainable Products

How To Get Promoted In Product Management

MORE WEBINARS

How Cargotec uses metadata replication to enable cross-account data sharing

AWS Big Data

JUNE 7, 2023

Cargotec captures terabytes of IoT telemetry data from their machinery operated by numerous customers across the globe. This data needs to be ingested into a data lake, transformed, and made available for analytics, machine learning (ML), and visualization. The target accounts read data from the source account S3 buckets.

Metadata

Metadata Data Lake Machine Learning Big Data

Build a multi-Region and highly resilient modern data architecture using AWS Glue and AWS Lake Formation

AWS Big Data

JANUARY 24, 2023

This solution only replicates metadata in the Data Catalog, not the actual underlying data. To have a redundant data lake using Lake Formation and AWS Glue in an additional Region, we recommend replicating the Amazon S3-based storage using S3 replication , S3 sync, aws-s3-copy-sync-using-batch or S3 Batch replication process.

Data Architecture

Data Architecture Metadata Data Lake Snapshot

What is a Data Mesh?

DataKitchen

AUGUST 3, 2021

The data mesh design pattern breaks giant, monolithic enterprise data architectures into subsystems or domains, each managed by a dedicated team. Third-generation – more or less like the previous generation but with streaming data, cloud, machine learning and other (fill-in-the-blank) fancy tools. See the pattern?

Data Architecture

Data Architecture Data Lake Cost-Benefit Data Warehouse

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

AWS Big Data

JUNE 10, 2024

Organizations have multiple Hive data warehouses across EMR clusters, where the metadata gets generated. To address this challenge, organizations can deploy a data mesh using AWS Lake Formation that connects the multiple EMR clusters. An entity can act both as a producer of data assets and as a consumer of data assets.

Data Lake

Data Lake Metadata Data Warehouse Data Processing

Architectural patterns for real-time analytics using Amazon Kinesis Data Streams, part 1

AWS Big Data

JANUARY 8, 2024

This is the first post to a blog series that offers common architectural patterns in building real-time data streaming infrastructures using Kinesis Data Streams for a wide range of use cases. All these architecture patterns are integrated with Amazon Kinesis Data Streams.

Analytics

Analytics IoT Data-driven Snapshot

The Future Is Hybrid Data, Embrace It

Cloudera

JUNE 7, 2022

Also, Cloudera DataFlow is rated highly in the GigaOm Radar for Streaming Data Platforms. Leading industry analysts rated Cloudera better at analytic and operational data use cases than many well-known cloud vendors. Only Cloudera has the power to span multi-cloud and on-premises with a hybrid data platform.

IT

IT Data Architecture Unstructured Data Big Data

How Swisscom automated Amazon Redshift as part of their One Data Platform solution using AWS CDK – Part 1

AWS Big Data

JUNE 12, 2024

Swisscom’s Data, Analytics, and AI division is building a One Data Platform (ODP) solution that will enable every Swisscom employee, process, and product to benefit from the massive value of Swisscom’s data. The following high-level architecture diagram shows ODP with different layers of the modern data architecture.

Data Architecture

Data Architecture Cost-Benefit Experimentation Management

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. In this post, we describe Orca’s journey building a transactional data lake using Amazon Simple Storage Service (Amazon S3), Apache Iceberg, and AWS Analytics.

Data Lake

Data Lake Analytics Snapshot Optimization

What is data governance? Best practices for managing data assets

CIO Business Intelligence

MARCH 24, 2023

The program must introduce and support standardization of enterprise data. Programs must support proactive and reactive change management activities for reference data values and the structure/use of master data and metadata. One of its key capabilities, TrustCheck, provides real-time “guardrails” to workflows.

Data Governance

Data Governance Management Metadata Data Quality

The Future Is Hybrid Data, Embrace It

CIO Business Intelligence

JUNE 23, 2022

Also, Cloudera DataFlow is rated highly in the GigaOm Radar for Streaming Data Platforms. Leading industry analysts rated Cloudera better at analytic and operational data use cases than many well-known cloud vendors. Only Cloudera has the power to span multi-cloud and on-premises with a hybrid data platform.

IT

IT Data Architecture Unstructured Data Big Data

A Summary Of Gartner’s Recent Innovation Insight Into Data Observability

DataKitchen

AUGUST 8, 2023

In the article, Melody Chien notes that Data Observability is a practice that extends beyond traditional monitoring and detection, providing robust, integrated visibility over data and data landscapes. It alerts data and analytics leaders to issues with their data before they multiply. When did it last run?

Data Quality

Data Quality Testing Snapshot Reporting

How Amazon Finance Automation built a data mesh to support distributed data ownership and centralize governance

AWS Big Data

JULY 14, 2023

In this post, we discuss how the Amazon Finance Automation team used AWS Lake Formation and the AWS Glue Data Catalog to build a data mesh architecture that simplified data governance at scale and provided seamless data access for analytics, AI, and machine learning (ML) use cases.

Finance

Finance Metadata Big Data Recreation/Entertainment

Extracting key insights from Amazon S3 access logs with AWS Glue for Ray

AWS Big Data

SEPTEMBER 7, 2023

Architecture overview The following diagram illustrates the solution architecture. The solution uses AWS Serverless Analytics services such as AWS Glue to optimize data layout by partitioning and formatting the server access logs to be consumed by other services. Big Data Architect. Zach Mitchell is a Sr.

Metadata

Metadata Dashboards Metrics Visualization

How Huron built an Amazon QuickSight Asset Catalogue with AWS CDK Based Deployment Pipeline

AWS Big Data

APRIL 26, 2023

Having an accurate and up-to-date inventory of all technical assets helps an organization ensure it can keep track of all its resources with metadata information such as their assigned oners, last updated date, used by whom, how frequently and more. This is a guest blog post co-written with Corey Johnson from Huron.

Metadata

Metadata Dashboards Visualization Consulting

A Day in the Life of a DataOps Engineer

DataKitchen

OCTOBER 11, 2021

First, you must understand the existing challenges of the data team, including the data architecture and end-to-end toolchain. The biggest challenge is broken data pipelines due to highly manual processes. Figure 1 shows a manually executed data analytics pipeline. Monitoring Job Metadata.

Testing

Testing Metadata Dashboards Statistics

AI at Scale isn’t Magic, it’s Data – Hybrid Data

Cloudera

OCTOBER 11, 2022

Al needs machine learning (ML), ML needs data science. Data science needs analytics. And they all need lots of data. But it isn’t just aggregating data for models. Data needs to be prepared and analyzed. And that data is likely in clouds, in data centers and at the edge.

Snapshot

Snapshot Data Science Digital Transformation Metadata

Choosing an open table format for your transactional data lake on AWS

AWS Big Data

JUNE 9, 2023

A modern data architecture enables companies to ingest virtually any type of data through automated pipelines into a data lake, which provides highly durable and cost-effective object storage at petabyte or exabyte scale.

Data Lake

Data Lake Metadata Optimization Statistics

Create an end-to-end data strategy for Customer 360 on AWS

AWS Big Data

MARCH 26, 2024

In this post, we discuss how you can use purpose-built AWS services to create an end-to-end data strategy for C360 to unify and govern customer data that address these challenges. We recommend building your data strategy around five pillars of C360, as shown in the following figure.

Data Strategy

Data Strategy Strategy Data Warehouse Prescriptive Analytics

Habib Bank manages data at scale with Cloudera Data Platform

Cloudera

NOVEMBER 17, 2022

While Cloudera CDH was already a success story at HBL, in 2022, HBL identified the need to move its customer data centre environment from Cloudera’s CDH to Cloudera Data Platform (CDP) Private Cloud to accommodate growing volumes of data. and primarily served regulatory reporting and internal analytics requirements.

Management

Management Data Lake Consulting Unstructured Data

2023 Predictions: Data Trends That Will Dominate Business Agenda in APAC

Cloudera

JANUARY 5, 2023

With the increase in demand for real-time data processing, streaming, and sharing, which power transformation into data-driven organizations, we anticipate more businesses investing in building adaptive AI systems that can ingest large amounts of data at frequent intervals and adapt to changes and variances quickly.

Cost-Benefit

Cost-Benefit Business Objectives Machine Learning Data Architecture

How smava makes loans transparent and affordable using Amazon Redshift Serverless

AWS Big Data

DECEMBER 21, 2023

To speed up the self-service analytics and foster innovation based on data, a solution was needed to provide ways to allow any team to create data products on their own in a decentralized manner. To create and manage the data products, smava uses Amazon Redshift , a cloud data warehouse.

Data Lake

Data Lake Data Warehouse Data-driven B2B

What Is Embedded Analytics?

Jet Global

MAY 1, 2023

Introduction Why should I read the definitive guide to embedded analytics? But many companies fail to achieve this goal because they struggle to provide the reporting and analytics users have come to expect. The Definitive Guide to Embedded Analytics is designed to answer any and all questions you have about the topic.

Analytics

Analytics Cost-Benefit Visualization Dashboards

Amazon DataZone now integrates with AWS Glue Data Quality and external data quality solutions

AWS Big Data

APRIL 3, 2024

As data becomes increasingly crucial for driving business decisions, Amazon DataZone users are keenly interested in providing the highest standards of data quality. They recognize the importance of accurate, complete, and timely data in enabling informed decision-making and fostering trust in their analytics and reporting processes.

Data Quality

Data Quality Visualization Metadata Metrics

Surviving Radical Disruption with Data Intelligence

erwin

OCTOBER 16, 2020

And this time sensitivity is a massive issue, as taking a proactive and data-driven approach can literally mean life or death to your business or to your customers. And that’s where data analytics can play a huge role. There’s a common denominator in what they’re all missing, and that is data intelligence.

Internet of Things

Internet of Things Data-driven Uncertainty Data Governance

The Future of the Data Lakehouse – Open

Cloudera

JUNE 18, 2022

Cloudera customers run some of the biggest data lakes on earth. These lakes power mission critical large scale data analytics, business intelligence (BI), and machine learning use cases, including enterprise data warehouses. On data warehouses and data lakes. Iterations of the lakehouse.

Data Lake

Data Lake Data Warehouse Machine Learning Cost-Benefit

Design a data mesh on AWS that reflects the envisioned organization

AWS Big Data

JANUARY 22, 2024

Cost and resource efficiency – This is an area where Acast observed a reduction in data duplication, and therefore cost reduction (in some accounts, removing the copy of data 100%), by reading data across accounts while enabling scaling. In this approach, teams responsible for generating data are referred to as producers.

Data-driven

Data-driven Advertising Metadata Data Architecture

The Future of the Data Lakehouse – Open

CIO Business Intelligence

JUNE 23, 2022

Cloudera customers run some of the biggest data lakes on earth. These lakes power mission critical large scale data analytics, business intelligence (BI), and machine learning use cases, including enterprise data warehouses. On data warehouses and data lakes. Iterations of the lakehouse.

Data Lake

Data Lake Data Warehouse Machine Learning Cost-Benefit

The Gartner 2022 Leadership Vision for Data and Analytics Leaders Questions and Answers

Andrew White

JANUARY 9, 2022

On Thursday January 6th I hosted Gartner’s 2022 Leadership Vision for Data and Analytics webinar. – In the webinar and Leadership Vision deck for Data and Analytics we called out AI engineering as a big trend. So in summary they are very similar concepts but data fabric seems to be the more rounded of the two.

Analytics

Analytics Measurement Modeling Data-driven

Build SAML identity federation for Amazon OpenSearch Service domains within a VPC

AWS Big Data

FEBRUARY 7, 2024

Amazon OpenSearch Service is a fully managed search and analytics service powered by the Apache Lucene search library that can be operated within a virtual private cloud (VPC). Download the IAM Identity Center SAML metadata file to use in a later step. Luca Menichetti is a Big Data Architect with Amazon Web Services.

Dashboards

Dashboards Data Processing Metadata Consulting

Data Mesh 101: How Data Mesh Helps Organizations Be Data-Driven and Achieve Velocity

Ontotext

FEBRUARY 12, 2024

Transferring ownership of data/datasets to domain-specific units that possess a deeper understanding of rules around the data empowers teams, improves data quality and trust, and greatly accelerates the building of data models and analytics.

Data-driven

Data-driven Data Lake Data Quality Business Objectives

Strategically Approaching Graph Technologies

Ontotext

FEBRUARY 26, 2024

They allow you to connect the dots across your data and progress on the spectrum toward knowledge, insight, and wisdom. For this reason, graph technologies have taken center stage in the data and analytics space as a critical enabler. Before you can really maximize the value from data, you must first connect the dots.

Technology

Technology Cost-Benefit Data-driven Metadata

Upgrade Journey: The Path from CDH to CDP Private Cloud

Cloudera

SEPTEMBER 28, 2020

The customer had shorter upgrade windows, which dictated the need to do an in-place upgrade by installing CDP on their existing environment, instead of a longer migration path that required data migrations, operation overhead to setup news clusters and time needed to procure and setup new hardware. on roadmap). Instead use Ranger REST API.

Testing

Testing Metadata Risk Data Science

Exploring real-time streaming for generative AI Applications

AWS Big Data

MARCH 25, 2024

Streaming jobs constantly ingest new data to synchronize across systems and can perform enrichment, transformations, joins, and aggregations across windows of time more efficiently. OpenSearch Service provides support for native ingestion from Kinesis data streams or MSK topics.

Data Lake

Data Lake Unstructured Data Management Modeling

“You Complete Me,” said Data Lineage to DataOps Observability.

DataKitchen

JANUARY 23, 2023

To capture a more complete picture of the data’s journey, it is important to have a DataOps Observability system in place. Data lineage is static and often lags by weeks or months. Data lineage is often considered static because it is typically based on snapshots of data and metadata taken at a specific time.

Testing

Testing Data Governance Data Quality Data-driven

Unstructured data management and governance using AWS AI/ML and analytics services

AWS Big Data

OCTOBER 25, 2023

But most important of all, the assumed dormant value in the unstructured data is a question mark, which can only be answered after these sophisticated techniques have been applied. Therefore, there is a need to being able to analyze and extract value from the data economically and flexibly. The solution integrates data in three tiers.

Unstructured Data

Unstructured Data Metadata Management Analytics

Modernizing and optimizing enterprise reporting [Infographic]

BI-Survey

FEBRUARY 6, 2020

Recent years have seen extensive interest in topics around explorative BI such as advanced and predictive analytics. ML allows non-statisticians to leverage advanced and predictive analytics to detect hidden patterns and correlations in data, increasing the depth of analyses conducted. .

Reporting

Reporting Optimization Enterprise Data Quality

5 Data Governance Mistakes to Avoid

Alation

APRIL 25, 2023

More specifically, it describes the process of creating, administering, and adapting a comprehensive plan for how an organization’s data will be managed. In this way, data governance has implications for a wide range of data management disciplines, including data architecture, quality, security, metadata, and more.

Data Governance

Data Governance Marketing Machine Learning Sales

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

AWS Big Data

MARCH 3, 2023

You can then apply transformations and store data in Delta format for managing inserts, updates, and deletes. Amazon EMR Serverless is a serverless option in Amazon EMR that makes it easy for data analysts and engineers to run open-source big data analytics frameworks without configuring, managing, and scaling clusters or servers.

Data Lake

Data Lake Dashboards Metrics Metadata

5 Data Governance Mistakes to Avoid

Alation

APRIL 25, 2023

More specifically, it describes the process of creating, administering, and adapting a comprehensive plan for how an organization’s data will be managed. In this way, data governance has implications for a wide range of data management disciplines, including data architecture, quality, security, metadata, and more.

Data Governance

Data Governance Marketing Machine Learning Sales

Why We Started the Data Intelligence Project

Alation

JULY 7, 2022

In 2013 I joined American Family Insurance as a metadata analyst. I had always been fascinated by how people find, organize, and access information, so a metadata management role after school was a natural choice. The use cases for metadata are boundless, offering opportunities for innovation in every sector.

Metadata

Metadata Data-driven Insurance Statistics

Augmented data management: Data fabric versus data mesh

IBM Big Data Hub

APRIL 27, 2022

Data fabric and data mesh are emerging data management concepts that are meant to address the organizational change and complexities of understanding, governing and working with enterprise data in a hybrid multicloud ecosystem. The good news is that both data architecture concepts are complimentary.

Management

Management Metadata Data Architecture Data Lake

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

Data architecture strategy for data quality

Webinars

Trending Sources

Insights from Gartner Data & Analytics Summit Orlando 2023

Webinars

How Cargotec uses metadata replication to enable cross-account data sharing

Build a multi-Region and highly resilient modern data architecture using AWS Glue and AWS Lake Formation

What is a Data Mesh?

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

Architectural patterns for real-time analytics using Amazon Kinesis Data Streams, part 1

The Future Is Hybrid Data, Embrace It

How Swisscom automated Amazon Redshift as part of their One Data Platform solution using AWS CDK – Part 1

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

What is data governance? Best practices for managing data assets

The Future Is Hybrid Data, Embrace It

A Summary Of Gartner’s Recent Innovation Insight Into Data Observability

How Amazon Finance Automation built a data mesh to support distributed data ownership and centralize governance

Extracting key insights from Amazon S3 access logs with AWS Glue for Ray

How Huron built an Amazon QuickSight Asset Catalogue with AWS CDK Based Deployment Pipeline

A Day in the Life of a DataOps Engineer

AI at Scale isn’t Magic, it’s Data – Hybrid Data

Choosing an open table format for your transactional data lake on AWS

Create an end-to-end data strategy for Customer 360 on AWS

Habib Bank manages data at scale with Cloudera Data Platform

2023 Predictions: Data Trends That Will Dominate Business Agenda in APAC

How smava makes loans transparent and affordable using Amazon Redshift Serverless

What Is Embedded Analytics?

Amazon DataZone now integrates with AWS Glue Data Quality and external data quality solutions

Surviving Radical Disruption with Data Intelligence

The Future of the Data Lakehouse – Open

Design a data mesh on AWS that reflects the envisioned organization

The Future of the Data Lakehouse – Open

The Gartner 2022 Leadership Vision for Data and Analytics Leaders Questions and Answers

Build SAML identity federation for Amazon OpenSearch Service domains within a VPC

Data Mesh 101: How Data Mesh Helps Organizations Be Data-Driven and Achieve Velocity

Strategically Approaching Graph Technologies

Upgrade Journey: The Path from CDH to CDP Private Cloud

Exploring real-time streaming for generative AI Applications

“You Complete Me,” said Data Lineage to DataOps Observability.

Unstructured data management and governance using AWS AI/ML and analytics services

Modernizing and optimizing enterprise reporting [Infographic]

5 Data Governance Mistakes to Avoid

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

5 Data Governance Mistakes to Avoid

Why We Started the Data Intelligence Project

Augmented data management: Data fabric versus data mesh

Stay Connected