Big Data, Data Architecture, Definition and Metadata

How Cargotec uses metadata replication to enable cross-account data sharing

AWS Big Data

JUNE 7, 2023

They chose AWS Glue as their preferred data integration tool due to its serverless nature, low maintenance, ability to control compute resources in advance, and scale when needed. To share the datasets, they needed a way to share access to the data and access to catalog metadata in the form of tables and views.

Metadata

Metadata Data Lake Machine Learning Big Data

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JUNE 10, 2024

Apache Iceberg brings the reliability and simplicity of SQL tables to big data, while making it possible for processing engines such as Apache Spark, Trino, Apache Flink, Presto, Apache Hive, and Impala to safely work with the same tables at the same time. This concept makes Iceberg extremely versatile. SparkActions.get().expireSnapshots(iceTable).expireOlderThan(TimeUnit.DAYS.toMillis(7)).execute()

Data Lake

Data Lake Metadata Snapshot Analytics

Build a multi-Region and highly resilient modern data architecture using AWS Glue and AWS Lake Formation

AWS Big Data

JANUARY 24, 2023

This solution only replicates metadata in the Data Catalog, not the actual underlying data. To have a redundant data lake using Lake Formation and AWS Glue in an additional Region, we recommend replicating the Amazon S3-based storage using S3 replication , S3 sync, aws-s3-copy-sync-using-batch or S3 Batch replication process.

Data Architecture

Data Architecture Metadata Data Lake Snapshot

Webinars

Leading the Development of Profitable and Sustainable Products

How To Get Promoted In Product Management

MORE WEBINARS

What is data governance? Best practices for managing data assets

CIO Business Intelligence

MARCH 24, 2023

Data governance definition Data governance is a system for defining who within an organization has authority and control over data assets and how those data assets may be used. It encompasses the people, processes, and technologies required to manage and protect data assets.

Data Governance

Data Governance Management Metadata Data Quality

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

AWS Big Data

AUGUST 31, 2023

Iceberg stores the metadata pointer for all the metadata files. When a SELECT query is reading an Iceberg table, the query engine first goes to the Iceberg catalog, then retrieves the entry of the location of the latest metadata file, as shown in the following diagram. The following example demonstrates this.

Data Lake

Data Lake Data Warehouse Metadata Data Architecture

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

AWS Big Data

SEPTEMBER 13, 2023

The Analytics specialty practice of AWS Professional Services (AWS ProServe) helps customers across the globe with modern data architecture implementations on the AWS Cloud. The File Manager Lambda function consumes those messages, parses the metadata, and inserts the metadata to the DynamoDB table odpf_file_tracker.

Data Lake

Data Lake Data Processing Metadata Snapshot

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

AWS Big Data

NOVEMBER 15, 2023

It seamlessly consolidates data from various data sources within AWS, including AWS Cost Explorer (and forecasting with Cost Explorer ), AWS Trusted Advisor , and AWS Compute Optimizer. Data providers and consumers are the two fundamental users of a CDH dataset.

Dashboards

Dashboards Analytics Metadata Data Warehouse

How smava makes loans transparent and affordable using Amazon Redshift Serverless

AWS Big Data

DECEMBER 21, 2023

Overview of solution As a data-driven company, smava relies on the AWS Cloud to power their analytics use cases. smava ingests data from various external and internal data sources into a landing stage on the data lake based on Amazon Simple Storage Service (Amazon S3).

Data Lake

Data Lake Data Warehouse Data-driven B2B

If Johnny Mnemonic Smuggled Linked Data

Ontotext

MAY 30, 2019

Second, Linked Data is creating highly connected, computer-processable definitions of entities. Linked Data and Volume. Speaking about data and volume, it seems apt to start this with the famous saying that “most companies think they have “Big Data” problems while they actually have big “data problems””.

Cost-Benefit

Cost-Benefit Technology Big Data Metadata

If Johnny Mnemonic Smuggled Linked Data

Ontotext

MAY 30, 2019

Second, Linked Data is creating highly connected, computer-processable definitions of entities. Linked Data and Volume. Speaking about data and volume, it seems apt to start this with the famous saying that “most companies think they have “Big Data” problems while they actually have big “data problems””.

Cost-Benefit

Cost-Benefit Technology Big Data Metadata

Very Meta … Unlocking Data’s Potential with Metadata Management Solutions

erwin

OCTOBER 24, 2019

Untapped data, if mined, represents tremendous potential for your organization. While there has been a lot of talk about big data over the years, the real hero in unlocking the value of enterprise data is metadata , or the data about the data. Metadata Is the Heart of Data Intelligence.

Metadata

Metadata Management Data-driven Data Architecture

The Book Look: New Book on the Data Catalog

TDAN

JUNE 30, 2020

About a week ago, I was teaching a data modeling class, and an attendee asked me to explain the concept of a data catalog. Like a lot of hype-related terms in IT, there is more than one definition. However, I had recently read the book, The Data Catalog: Sherlock Holmes Data Sleuthing for Analytics by […].

Modeling

Modeling Analytics Data Architecture IT

Modernizing and optimizing enterprise reporting [Infographic]

BI-Survey

FEBRUARY 6, 2020

In fact, most companies struggle with data-related problems , organizational issues or outdated business intelligence and data analytics infrastructures which ignore more current usage scenarios such as explorative analytics on big data sources. Fact-Based Decision-Making Requires Trusted And Managed Data.

Reporting

Reporting Optimization Enterprise Data Quality

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

AWS Big Data

MARCH 3, 2023

You can then apply transformations and store data in Delta format for managing inserts, updates, and deletes. Amazon EMR Serverless is a serverless option in Amazon EMR that makes it easy for data analysts and engineers to run open-source big data analytics frameworks without configuring, managing, and scaling clusters or servers.

Data Lake

Data Lake Dashboards Metrics Metadata

What Is a Data Fabric and How Does a Data Catalog Support It?

Alation

JANUARY 25, 2022

Data Fabric: A Love Story , we defined data fabric and outlined its uses and motivations. As a reminder, here’s Gartner’s definition of data fabric: “A design concept that serves as an integrated layer (fabric) of data and connecting processes. The data catalog is a foundational layer of the data fabric.

Metadata

Metadata IT Metrics Data-driven

The Cloud Connection: How Governance Supports Security

Alation

APRIL 14, 2022

In today’s AI/ML-driven world of data analytics, explainability needs a repository just as much as those doing the explaining need access to metadata, EG, information about the data being used. The Cloud Data Migration Challenge. It’s More Important to Know What Your Data Means Than Where It Is.

Metadata

Metadata Data Governance Modeling Data-driven

Data platform trinity: Competitive or complementary?

IBM Big Data Hub

JANUARY 18, 2023

Data fabric promotes data discoverability. Here, data assets can be published into categories, creating an enterprise-wide data marketplace. This marketplace provides a search mechanism, utilizing metadata and a knowledge graph to enable asset discovery. Data mesh: A mostly new culture.

Data Lake

Data Lake Data Warehouse Data-driven Metadata

What Is Embedded Analytics?

Jet Global

MAY 1, 2023

Introduction Why should I read the definitive guide to embedded analytics? The Definitive Guide to Embedded Analytics is designed to answer any and all questions you have about the topic. It is now most definitely a need-to-have. Ideally, your primary data source should belong in this group. intranets or extranets).

Analytics

Analytics Cost-Benefit Visualization Dashboards

How to modernize data lakes with a data lakehouse architecture

IBM Big Data Hub

JULY 5, 2023

This was, without a question, a significant departure from traditional analytic environments, which often meant vendor-lock in and the inability to work with data at scale. Another unexpected challenge was the introduction of Spark as a processing framework for big data. Comprehensive data security and data governance (i.e.

Data Lake

Data Lake Metadata Cost-Benefit Data Warehouse

Data Leaders Brief

How Cargotec uses metadata replication to enable cross-account data sharing

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

Webinars

Trending Sources

Build a multi-Region and highly resilient modern data architecture using AWS Glue and AWS Lake Formation

Webinars

What is data governance? Best practices for managing data assets

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

How smava makes loans transparent and affordable using Amazon Redshift Serverless

If Johnny Mnemonic Smuggled Linked Data

If Johnny Mnemonic Smuggled Linked Data

Very Meta … Unlocking Data’s Potential with Metadata Management Solutions

The Book Look: New Book on the Data Catalog

Modernizing and optimizing enterprise reporting [Infographic]

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

What Is a Data Fabric and How Does a Data Catalog Support It?

The Cloud Connection: How Governance Supports Security

Data platform trinity: Competitive or complementary?

What Is Embedded Analytics?

How to modernize data lakes with a data lakehouse architecture

Stay Connected