article thumbnail

How Cargotec uses metadata replication to enable cross-account data sharing

AWS Big Data

In this blog, we discuss the technical challenges faced by Cargotec in replicating their AWS Glue metadata across AWS accounts, and how they navigated these challenges successfully to enable cross-account data sharing. Solution overview Cargotec required a single catalog per account that contained metadata from their other AWS accounts.

article thumbnail

Build efficient ETL pipelines with AWS Step Functions distributed map and redrive feature

AWS Big Data

Amazon S3 hosts the metadata of all the tables as a.csv file. The pipeline uses the Step Functions distributed map to read the table metadata from Amazon S3, iterate on every single item, and call the downstream AWS Glue job in parallel to export the data. The following diagram illustrates the Step Functions workflow.

Metadata 117
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

AWS Big Data

BMS’s EDLS platform hosts over 5,000 jobs and is growing at 15% YoY (year over year). EDLS job steps and metadata Every EDLS job comprises one or more job steps chained together and run in a predefined order orchestrated by the custom ETL framework. It retrieves the specified files and available metadata to show on the UI.

article thumbnail

Data governance beyond SDX: Adding third party assets to Apache Atlas

Cloudera

In this blog, we’ll highlight the key CDP aspects that provide data governance and lineage and show how they can be extended to incorporate metadata for non-CDP systems from across the enterprise. Atlas provides open metadata management and governance capabilities to build a catalog of all assets, and also classify and govern these assets.

article thumbnail

The Top Three Entangled Trends in Data Architectures: Data Mesh, Data Fabric, and Hybrid Architectures

Cloudera

The data product is not just the data itself, but a bunch of metadata that surrounds it — the simple stuff like schema is a given. It is also agnostic to where the different domains are hosted. And during this time they will by definition have a hybrid architecture. What are the different definitions of hybrid architectures?

article thumbnail

Mastering Ingress in the UI: Elevating your app visibility

IBM Big Data Hub

Update the Kubernetes secret definition by adding or removing fields or updating the referenced Secrets Manager CRN for a TLS secret. v1 kind: Ingress metadata: annotations: kubernetes.io/ingress.class: ALB generation: 1 name: echo-ingress namespace: echo-namespace spec: rules: - host: techcorp.com // 1.

article thumbnail

Do Large Language Models Dream of Knowledge Graphs – Impressions from Day 2 At SEMANTiCS 2023

Ontotext

Both speakers talked about common metadata standards and adequate language resources as key enablers of efficient interoperable, multilingual projects. Just like the typewriter in the hall hosting the Poster’s park , LLMs are yet another tool poised to change the way we work with language.