Analytics, Data Lake, Metadata and Technology

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

AWS Big Data

MARCH 10, 2023

Since the deluge of big data over a decade ago, many organizations have learned to build applications to process and analyze petabytes of data. Data lakes have served as a central repository to store structured and unstructured data at any scale and in various formats.

Data Lake

Data Lake Sales Data Warehouse Snapshot

Build a real-time GDPR-aligned Apache Iceberg data lake

AWS Big Data

FEBRUARY 24, 2023

Data lakes are a popular choice for today’s organizations to store their data around their business activities. As a best practice of a data lake design, data should be immutable once stored. A data lake built on AWS uses Amazon Simple Storage Service (Amazon S3) as its primary storage environment.

Data Lake

Data Lake Metadata Testing Data Warehouse

Gartner Data & Analytics Sydney 2022

Timo Elliott

NOVEMBER 21, 2022

Last week I was in beautiful Sydney, Australia for the Gartner Data and Analytics Conference. Here’s a quick video summary: One of the big things that struck me was the changing role of data. Data is useless. It’s possible, but it takes huge amounts of time and effort.

Data Analytics

Data Analytics Analytics Recreation/Entertainment Data Lake

Webinars

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

How To Get Promoted In Product Management

MORE WEBINARS

How Knowledge Graphs Power Data Mesh and Data Fabric

Ontotext

APRIL 10, 2024

The data ecosystem today is crowded with dazzling buzzwords, all fighting for investment dollars. A survey in 2021 found that a data company was being funded every 45 minutes. Data ecosystems have become jungles and in spite of all the technology, data teams are struggling to create a modern data experience.

Metadata

Metadata Data Lake Data Warehouse Data Quality

Introducing AWS Glue crawler and create table support for Apache Iceberg format

AWS Big Data

AUGUST 16, 2023

Iceberg has become very popular for its support for ACID transactions in data lakes and features like schema and partition evolution, time travel, and rollback. Iceberg captures metadata information on the state of datasets as they evolve and change over time.

Data Lake

Data Lake Metadata Snapshot Management

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

AWS Big Data

DECEMBER 13, 2023

For the past 5 years, BMS has used a custom framework called Enterprise Data Lake Services (EDLS) to create ETL jobs for business users. EDLS job steps and metadata Every EDLS job comprises one or more job steps chained together and run in a predefined order orchestrated by the custom ETL framework.

Metadata

Metadata Data Lake Visualization Data Transformation

How gaming companies can use Amazon Redshift Serverless to build scalable analytical applications faster and easier

AWS Big Data

MARCH 7, 2023

This post provides guidance on how to build scalable analytical solutions for gaming industry use cases using Amazon Redshift Serverless. Flexible and easy to use – The solutions should provide less restrictive, easy-to-access, and ready-to-use data. Data hubs and data lakes can coexist in an organization, complementing each other.

Analytics

Analytics Data Warehouse Data Lake Metadata

Configure cross-Region table access with the AWS Glue Catalog and AWS Lake Formation

AWS Big Data

AUGUST 3, 2023

Today’s modern data lakes span multiple accounts, AWS Regions, and lines of business in organizations. It’s important that their data solution gives them the ability to share and access data securely and safely across Regions. A resource link is a Data Catalog object that is a link to a database or table.

Data Lake

Data Lake Metadata Management Data Processing

Case study: Policy Enforcement Automation With Semantics

Ontotext

MAY 2, 2024

Particularly those on the “the create side of the house” who are tasked to deliver insights and analytics. They are expected to understand the entire data landscape and generate business-moving insights while facing the voracious needs of different teams and the constraints of technology architecture and compliance.

Metadata

Metadata Data Lake Data-driven Enterprise

Query your Apache Hive metastore with AWS Lake Formation permissions

AWS Big Data

JULY 20, 2023

The Hive metastore is a repository of metadata about the SQL tables, such as database names, table names, schema, serialization and deserialization information, data location, and partition details of each table. Apache Hive, Apache Spark, Presto, and Trino can all use a Hive Metastore to retrieve metadata to run queries.

Data Lake

Data Lake Metadata Data Processing Big Data

What is a data architect? Skills, salaries, and how to become a data framework master

CIO Business Intelligence

OCTOBER 13, 2023

Data architect role Data architects are senior visionaries who translate business requirements into technology requirements and define data standards and principles, often in support of data or digital transformations.

Data Architecture

Data Architecture Data Warehouse Statistics Visualization

How SumUp made digital analytics more accessible using AWS Glue

AWS Big Data

JUNE 6, 2023

SumUp is a leading global financial technology company driven by the purpose of leveling the playing field for small businesses. Unless, of course, the rest of their data also resides in the Google Cloud. Unless, of course, the rest of their data also resides in the Google Cloud. For more information, please visit sumup.co.uk.

Analytics

Analytics Data Lake Testing Optimization

Putting the Business Back Into Business Innovation

Timo Elliott

DECEMBER 14, 2022

The future is enabled by technology, but it’s not about the technical infrastructures: it’s about optimizing end-to-end processes, business capabilities, and business ecosystems. And that’s where SAP Business Technology Platform (SAP BTP) comes in. The analysts call this a data mesh or data fabric strategy.

Data Lake

Data Lake Recreation/Entertainment Metadata Data Warehouse

Educating ChatGPT on Data Lakehouse

Cloudera

MARCH 17, 2023

I love the enthusiasm surrounding ChatGPT and the eagerness to learn about modern data architectures such as data lakehouses, data meshes, and data fabrics. ChatGPT is an excellent resource for gaining high-level insights and building awareness of any technology. I thought this was a fairly comprehensive list.

Unstructured Data

Unstructured Data Data Lake Data Warehouse Machine Learning

AWS Lake Formation 2022 year in review

AWS Big Data

JANUARY 31, 2023

We have collected some of the key talks and solutions on data governance, data mesh, and modern data architecture published and presented in AWS re:Invent 2022, and a few data lake solutions built by customers and AWS Partners for easy reference. Starting with Amazon EMR release 6.7.0,

Data Lake

Data Lake Data Governance Data Architecture Machine Learning

Trends in Data Management and Analytics

TDAN

MARCH 19, 2019

Various databases, plus one or more data warehouses, have been the state-of-the art data management infrastructure in companies for years. The emergence of various new concepts, technologies, and applications such as Hadoop, Tableau, R, Power BI, or Data Lakes indicate that changes are under way.

Management

Management Data Lake Data Warehouse Analytics

What Is a Data Catalog?

Alation

FEBRUARY 13, 2020

Why do we need a data catalog? What does a data catalog do? These are all good questions and a logical place to start your data cataloging journey. Data catalogs have become the standard for metadata management in the age of big data and self-service analytics. What is a Data Catalog?

Metadata

Metadata Data Lake Recreation/Entertainment Big Data

Data architecture strategy for data quality

IBM Big Data Hub

JANUARY 5, 2023

Next generation of big data platforms and long running batch jobs operated by a central team of data engineers have often led to data lake swamps. Monitor and identify data quality issues closer to the source to mitigate the potential impact on downstream processes or workloads.

Data Quality

Data Quality Data Architecture Strategy Data Lake

What Is Data Curation?

Alation

FEBRUARY 13, 2020

Data curation is a term that has recently become a common part of data management vocabulary. Data curation is important in today’s world of data sharing and self-service analytics, but I think it is a frequently misused term. Curating data involves much more than storing data in a shared database.

Metadata

Metadata Data Warehouse Data Lake Data Governance

Exploring real-time streaming for generative AI Applications

AWS Big Data

MARCH 25, 2024

Streaming jobs constantly ingest new data to synchronize across systems and can perform enrichment, transformations, joins, and aggregations across windows of time more efficiently. OpenSearch Service provides support for native ingestion from Kinesis data streams or MSK topics.

Data Lake

Data Lake Unstructured Data Management Modeling

Gartner® Magic Quadrant™ for Cloud Database Report Recognizes Cloudera as a Visionary

Cloudera

JANUARY 19, 2022

Gartner® recognized Cloudera in three recent reports – Magic Quadrant for Cloud Database Management Systems (DBMS), Critical Capabilities for Cloud Database Management Systems for Analytical Use Cases and Critical Capabilities for Cloud Database Management Systems for Operational Use Cases. Download the reports to see the detailed scores .

Reporting

Reporting Data Warehouse Data Lake Machine Learning

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

AWS Big Data

AUGUST 31, 2023

Tens of thousands of customers today use Amazon Redshift to analyze exabytes of data and run analytical queries, making it the most widely used cloud data warehouse. With Amazon Redshift, you can query the data in your S3 data lake using a central AWS Glue metastore from your Redshift data warehouse.

Data Lake

Data Lake Data Warehouse Metadata Data Architecture

My Understanding of the Gartner® Hype Cycle™ for Finance Data and Analytics Governance, 2023

Data Virtualization

MARCH 28, 2024

As noted in the Gartner Hype Cycle for Finance Data and Analytics Governance, 2023, “Through. The post My Understanding of the Gartner® Hype Cycle™ for Finance Data and Analytics Governance, 2023 appeared first on Data Management Blog - Data Integration and Modern Data Management Articles, Analysis and Information.

Finance

Finance Digital Transformation Analytics Data Integration

Amazon DataZone announces integration with AWS Lake Formation hybrid access mode for the AWS Glue Data Catalog

AWS Big Data

APRIL 8, 2024

Prior to this integration, you had to complete the following steps before Amazon DataZone could treat the published Data Catalog table as a managed asset: Identity the Amazon S3 location associated with Data Catalog table. Publish the table metadata to the Amazon DataZone business data catalog.

Finance

Finance Sales Publishing Metadata

How to use foundation models and trusted governance to manage AI workflow risk

IBM Big Data Hub

OCTOBER 16, 2023

As more businesses use AI systems and the technology continues to mature and change, improper use could expose a company to significant financial, operational, regulatory and reputational risks. It includes processes that trace and document the origin of data, models and associated metadata and pipelines for audits.

Risk

Risk Modeling Management Metadata

Five benefits of a data catalog

IBM Big Data Hub

DECEMBER 16, 2022

For example, data catalogs have evolved to deliver governance capabilities like managing data quality and data privacy and compliance. It uses metadata and data management tools to organize all data assets within your organization. Technical metadata to describe schemas, indexes and other database objects.

Metadata

Metadata Data Quality Data-driven Data Governance

The Very Group adopts a data catalog to better organize and leverage its online retail capabilities

CIO Business Intelligence

SEPTEMBER 6, 2022

Behind the flagship brand, though, he says data remained scattered in siloes across many legacy business units and applications, with limited automation, many glossaries, and complex data lineage, and stewardship making it hard to govern and audit. Establishing a clear and unified approach to data. Where do we store it?

IT

IT Forecasting Data Lake Enterprise

What is Data Mesh?

Ontotext

NOVEMBER 16, 2023

Think of data mesh as an operational mode for organizations with a domain-driven, decentralized data architecture. It’s a combination of implementation, organizational patterns, and a technology-agnostic set of principles. What Is a Data Product and Who Owns Them?

Metadata

Metadata Data-driven Data Quality Data Architecture

Demystifying Modern Data Platforms

Cloudera

SEPTEMBER 15, 2022

July brings summer vacations, holiday gatherings, and for the first time in two years, the return of the Massachusetts Institute of Technology (MIT) Chief Data Officer symposium as an in-person event. A key area of focus for the symposium this year was the design and deployment of modern data platforms.

Data Lake

Data Lake Data Architecture Data-driven Data Warehouse

Convergent Evolution

Peter James Thomas

AUGUST 18, 2018

That was the Science, here comes the Technology… A Brief Hydrology of Data Lakes. Even back then, these were used for activities such as Analytics , Dashboards , Statistical Modelling , Data Mining and Advanced Visualisation. This required additional investments in metadata.

Data Lake

Data Lake Data Warehouse Data mining Statistics

A Few 2016 Technology Predictions

In(tegrate) the Clouds

DECEMBER 21, 2015

I enjoy the end of the year technology predictions, even though it’s hard to argue with this tweet from Merv Adrian: By 2016, 99% of readers will be utterly sick of predictions. 2016 will be the year of the data lake. I can’t help it. Merv Adrian (@merv) December 19, 2015. platform.twitter.com/widgets.js.

Technology

Technology Internet of Things Digital Transformation Software

Turnkey Cloud DataOps: Solution from Alation and Accenture

Alation

MARCH 22, 2022

In fact, DataOps (and its underlying foundation, Data Fabric ), has ranked as a top area of inquiry and interest with analysts over the past twelve months. DataOps requires an array of technology to automate the design, development, deployment, and management of data delivery, with governance sprinkled on for good measure.

Metadata

Metadata Cost-Benefit Data Quality Data Lake

Power enterprise-grade Data Vaults with Amazon Redshift – Part 2

AWS Big Data

NOVEMBER 16, 2023

Amazon Redshift is a popular cloud data warehouse, offering a fully managed cloud-based service that seamlessly integrates with an organization’s Amazon Simple Storage Service (Amazon S3) data lake, real-time streams, machine learning (ML) workflows, transactional workflows, and much more—all while providing up to 7.9x

Enterprise

Enterprise Data Warehouse Snapshot Cost-Benefit

Modernizing Data Architectures

Data Virtualization

AUGUST 26, 2020

Recently, we have seen the rise of new technologies like big data, the Internet of things (IoT), and data lakes. But we have not seen many developments in the way that data gets delivered. Modernizing the data infrastructure is the.

Data Architecture

Data Architecture Internet of Things Data Lake IoT

Overcome these six data consumption challenges for a more data-driven enterprise

IBM Big Data Hub

JUNE 8, 2022

This requires data stewardship and data engineering practices to curate data standards and track data lineage, increasing the value of data. AI and Analytics is only good as the quality of data been used for it. Data silos.

Data-driven

Data-driven Enterprise Data Governance Data Lake

Data Profiling: What It Is and How to Perfect It

Alation

APRIL 18, 2023

What is data profiling? Definition and purpose of data profiling Data profiling is the process of analyzing and assessing the quality, structure, and content of data. Data profiling technology examines data values, formats, relationships, and patterns to identify data quality issues, dependencies, and relationships.

IT

IT Metadata Data Quality Data Governance

How Data Governance Protects Sensitive Data

erwin

APRIL 2, 2021

That’s especially true as the data-driven enterprise momentum grows along with self-service analytics that enable users to have greater access to information, often using it without IT’s knowledge. And knowing the business purpose translates into actively governing personal data against potential privacy and security violations.

Data Governance

Data Governance Cost-Benefit Risk Metadata

Choosing an open table format for your transactional data lake on AWS

AWS Big Data

JUNE 9, 2023

A modern data architecture enables companies to ingest virtually any type of data through automated pipelines into a data lake, which provides highly durable and cost-effective object storage at petabyte or exabyte scale.

Data Lake

Data Lake Metadata Optimization Statistics

Announcing the 2021 Data Impact Awards

Cloudera

MAY 12, 2021

Use cases could include but are not limited to: workload analysis and replication, migrating or bursting to cloud, data warehouse optimization, and more. DATA FOR GOOD. Winners included: Connect the Data Lifecycle: Globe Telecom — Raising experience standards and helping customers live enhanced mobile lifestyles.

Digital Transformation

Digital Transformation Machine Learning Optimization Data Lake

Week in the Life of an Analyst at Gartner US IT Symposium (virtual) 2021

Andrew White

OCTOBER 22, 2021

Technology Vendor 2. Analytics Tactics (known outcome/known data/BI/analytics v unknown outcome/unknown data/data science/ML) 11. Data Hub Strategy 10. Lakehouse (data warehouse and data lake working together) 8. Data Literacy, training, coordination, collaboration 8.

IT

IT Data Lake Strategy Data Science

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. In this post, we describe Orca’s journey building a transactional data lake using Amazon Simple Storage Service (Amazon S3), Apache Iceberg, and AWS Analytics.

Data Lake

Data Lake Analytics Snapshot Optimization

Introducing MongoDB Atlas metadata collection with AWS Glue crawlers

AWS Big Data

FEBRUARY 6, 2023

For data lake customers who need to discover petabytes of data, AWS Glue crawlers are a popular way to discover and catalog data in the background. This allows users to search and find relevant data from multiple data sources. Choose the table to view the schema and other metadata.

Metadata

Metadata Data Lake Machine Learning Management

Cloudera Data Warehouse Demonstrates Best-in-Class Cloud-Native Price-Performance

Cloudera

JANUARY 15, 2021

Cloud data warehouses allow users to run analytic workloads with greater agility, better isolation and scale, and lower administrative overhead than ever before. As depicted in the chart, Cloudera Data Warehouse ran the benchmark with significantly better price-performance than any of the other competitors tested. Introduction.

Data Warehouse

Data Warehouse Cost-Benefit Consulting Interactive

Introducing Apache Hudi support with AWS Glue crawlers

AWS Big Data

NOVEMBER 22, 2023

Apache Hudi is an open table format that brings database and data warehouse capabilities to data lakes. Apache Hudi helps data engineers manage complex challenges, such as managing continuously evolving datasets with transactions while maintaining query performance.

Data Lake

Data Lake Snapshot Metadata Optimization

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

Build a real-time GDPR-aligned Apache Iceberg data lake

Webinars

Trending Sources

Gartner Data & Analytics Sydney 2022

Webinars

How Knowledge Graphs Power Data Mesh and Data Fabric

Introducing AWS Glue crawler and create table support for Apache Iceberg format

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

How gaming companies can use Amazon Redshift Serverless to build scalable analytical applications faster and easier

Configure cross-Region table access with the AWS Glue Catalog and AWS Lake Formation

Case study: Policy Enforcement Automation With Semantics

Query your Apache Hive metastore with AWS Lake Formation permissions

What is a data architect? Skills, salaries, and how to become a data framework master

How SumUp made digital analytics more accessible using AWS Glue

Putting the Business Back Into Business Innovation

Educating ChatGPT on Data Lakehouse

AWS Lake Formation 2022 year in review

Trends in Data Management and Analytics

What Is a Data Catalog?

Data architecture strategy for data quality

What Is Data Curation?

Exploring real-time streaming for generative AI Applications

Gartner® Magic Quadrant™ for Cloud Database Report Recognizes Cloudera as a Visionary

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

My Understanding of the Gartner® Hype Cycle™ for Finance Data and Analytics Governance, 2023

Amazon DataZone announces integration with AWS Lake Formation hybrid access mode for the AWS Glue Data Catalog

How to use foundation models and trusted governance to manage AI workflow risk

Five benefits of a data catalog

The Very Group adopts a data catalog to better organize and leverage its online retail capabilities

What is Data Mesh?

Demystifying Modern Data Platforms

Convergent Evolution

A Few 2016 Technology Predictions

Turnkey Cloud DataOps: Solution from Alation and Accenture

Power enterprise-grade Data Vaults with Amazon Redshift – Part 2

Modernizing Data Architectures

Overcome these six data consumption challenges for a more data-driven enterprise

Data Profiling: What It Is and How to Perfect It

How Data Governance Protects Sensitive Data

Choosing an open table format for your transactional data lake on AWS

Announcing the 2021 Data Impact Awards

Week in the Life of an Analyst at Gartner US IT Symposium (virtual) 2021

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Introducing MongoDB Atlas metadata collection with AWS Glue crawlers

Cloudera Data Warehouse Demonstrates Best-in-Class Cloud-Native Price-Performance

Introducing Apache Hudi support with AWS Glue crawlers

Stay Connected