Consulting, Data Lake and Metadata

Consulting

Data Lake

Metadata

Accelerate HiveQL with Oozie to Spark SQL migration on Amazon EMR

AWS Big Data

APRIL 19, 2023

We split the solution into two primary components: generating Spark job metadata and running the SQL on Amazon EMR. The first component (metadata setup) consumes existing Hive job configurations and generates metadata such as number of parameters, number of actions (steps), and file formats. sql_path SQL file name.

Metadata

Metadata Testing Data Lake Consulting

What is a data architect? Skills, salaries, and how to become a data framework master

CIO Business Intelligence

OCTOBER 13, 2023

Data architect Armando Vázquez identifies eight common types of data architects: Enterprise data architect: These data architects oversee an organization’s overall data architecture, defining data architecture strategy and designing and implementing architectures.

Data Architecture

Data Architecture Data Warehouse Statistics Visualization

Join 52,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Leading the Development of Profitable and Sustainable Products

How To Get Promoted In Product Management

MORE WEBINARS

Trending Sources

How HR&A uses Amazon Redshift spatial analytics on Amazon Redshift Serverless to measure digital equity in states across the US

AWS Big Data

DECEMBER 5, 2023

HR&A Advisors —a multi-disciplinary consultancy with extensive work in the broadband and digital equity space is helping its state, county, and municipal clients deliver affordable internet access by analyzing locally specific digital inclusion needs and building tailored digital equity plans.

Measurement

Measurement Dashboards Data Warehouse Analytics

Webinars

Leading the Development of Profitable and Sustainable Products

How To Get Promoted In Product Management

MORE WEBINARS

The Madness of Data (and analytics) Governance

Andrew White

DECEMBER 9, 2019

The client had recently engaged with a well-known consulting company that had recommended a large data catalog effort to collect all enterprise metadata to help identify all data and business issues. Modern data (and analytics) governance does not necessarily need: Wall-to-wall discovery of your data and metadata.

Analytics

Analytics Data Lake Data Governance Metadata

What Is Data Curation?

Alation

FEBRUARY 13, 2020

Data curation is important in today’s world of data sharing and self-service analytics, but I think it is a frequently misused term. When speaking and consulting, I often hear people refer to data in their data lakes and data warehouses as curated data, believing that it is curated because it is stored as shareable data.

Metadata

Metadata Data Warehouse Data Lake Data Governance

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

AWS Big Data

SEPTEMBER 13, 2023

A modern data architecture is an evolutionary architecture pattern designed to integrate a data lake, data warehouse, and purpose-built stores with a unified governance model. The company wanted the ability to continue processing operational data in the secondary Region in the rare event of primary Region failure.

Data Lake

Data Lake Data Processing Metadata Snapshot

Exploring real-time streaming for generative AI Applications

AWS Big Data

MARCH 25, 2024

Streaming jobs constantly ingest new data to synchronize across systems and can perform enrichment, transformations, joins, and aggregations across windows of time more efficiently. With a file system sink connector, Apache Flink jobs can deliver data to Amazon S3 in open format (such as JSON, Avro, Parquet, and more) files as data objects.

Data Lake

Data Lake Unstructured Data Management Modeling

Data Profiling: What It Is and How to Perfect It

Alation

APRIL 18, 2023

Gartner defines data profiling as: A technology for discovering and investigating data quality issues, such as duplication, lack of consistency, and lack of accuracy and completeness. The tools provide data statistics, such as degree of duplication and ratios of attribute values, both in tabular and graphical formats.

IT Metadata Data Quality Data Governance

Cloudera Data Warehouse Demonstrates Best-in-Class Cloud-Native Price-Performance

Cloudera

JANUARY 15, 2021

Cloudera Data Warehouse is a highly scalable service that marries the SQL engine technologies of Apache Impala and Apache Hive with cloud-native features to deliver best-in-class price-performance for users running data warehousing workloads in the cloud. The benchmark run by McKnight Consulting Group used the Impala engine.

Data Warehouse

Data Warehouse Cost-Benefit Consulting Interactive

Optimize data layout by bucketing with Amazon Athena and AWS Glue to accelerate downstream queries

AWS Big Data

APRIL 25, 2024

In the era of data, organizations are increasingly using data lakes to store and analyze vast amounts of structured and unstructured data. Data lakes provide a centralized repository for data from various sources, enabling organizations to unlock valuable insights and drive data-driven decision-making.

Optimization

Optimization Data Lake Cost-Benefit Reporting

Choosing an open table format for your transactional data lake on AWS

AWS Big Data

JUNE 9, 2023

A modern data architecture enables companies to ingest virtually any type of data through automated pipelines into a data lake, which provides highly durable and cost-effective object storage at petabyte or exabyte scale.

Data Lake

Data Lake Metadata Optimization Statistics

What Is Alation Connected Sheets? Q&A with the Creators

Alation

NOVEMBER 28, 2022

But refreshing this analysis with the latest data was impossible… unless you were proficient in SQL or Python. We wanted to make it easy for anyone to pull data and self service without the technical know-how of the underlying database or data lake. Sathish and I met in 2004 when we were working for Oracle.

Metadata

Metadata Enterprise Cost-Benefit Finance

Top Graph Use Cases and Enterprise Applications (with Real World Examples)

Ontotext

MARCH 8, 2023

Rich metadata and semantic modeling continue to drive the matching of 50K training materials to specific curricula, leading new, data-driven, audience-based marketing efforts that demonstrate how the recommender service is achieving increased engagement and performance from over 2.3 million users.

Enterprise

Enterprise Knowledge Discovery Risk Data-driven

Habib Bank manages data at scale with Cloudera Data Platform

Cloudera

NOVEMBER 17, 2022

HBL started their data journey in 2019 when data lake initiative was started to consolidate complex data sources and enable the bank to use single version of truth for decision making. CDP Private Cloud’s new approach to data management and analytics would allow HBL to access powerful self-service analytics.

Management

Management Data Lake Consulting Unstructured Data

In-depth with CDO Christopher Bannocks

Peter James Thomas

AUGUST 29, 2018

I have since run and driven transformation in Reference Data, Master Data , KYC [3] , Customer Data, Data Warehousing and more recently Data Lakes and Analytics , constantly building experience and capability in the Data Governance , Quality and data services domains, both inside banks, as a consultant and as a vendor.

Data-driven

Data-driven Cost-Benefit Metadata Technology

Five actionable steps to GDPR compliance (Right to be forgotten) with Amazon Redshift

AWS Big Data

JULY 28, 2023

Tagging Consider tagging your Amazon Redshift resources to quickly identify which clusters and snapshots contain the PII data, the owners, the data retention policy, and so on. Tags provide metadata about resources at a glance. Refer to AWS Lake Formation-managed Redshift shares for more details on the implementation.

Snapshot

Snapshot Metadata Measurement Data Warehouse

6 BI challenges IT teams must address

CIO Business Intelligence

DECEMBER 21, 2022

By 2025, it’s estimated we’ll have 463 million terabytes of data created every day,” says Lisa Thee, data for good sector lead at Launch Consulting Group in Seattle. But what they really need to do is fundamentally rethink how data is managed and accessed,” he says. And key to this is the metadata management.”

IT Business Intelligence Sales Key Performance Indicator

Fabrics, Meshes & Stacks, oh my! Q&A with Sanjeev Mohan

Alation

AUGUST 11, 2022

Today, the brightest minds in our industry are targeting the massive proliferation of data volumes and the accompanying but hard-to-find value locked within all that data. We chatted about industry trends, why decentralization has become a hot topic in the data world, and how metadata drives many data-centric use cases.

Metadata

Metadata Data Warehouse Data Quality Data Lake

Federate Amazon QuickSight access with open-source identity provider Keycloak

AWS Big Data

JUNE 13, 2023

Download the SAML metadata file. In the navigation pane under Clients , import the SAML metadata file. Download the Keycloak IdP SAML metadata file from that URL location. For Metadata document , upload the Keycloak IdP SAML metadata XML file you downloaded and saved to your local machine earlier. Choose Browse.

Metadata

Metadata Dashboards Business Intelligence Management

Automate legacy ETL conversion to AWS Glue using Cognizant Data and Intelligence Toolkit (CDIT) – ETL Conversion Tool

AWS Big Data

OCTOBER 4, 2023

Cognizant Data & Intelligence Toolkit (CDIT) – ETL Conversion Tool automates this process, bringing in more predictability and accuracy, eliminating the risk associated with manual conversion, and providing faster time to market for customers. Cognizant is an AWS Premier Tier Services Partner with several AWS Competencies.

Data Warehouse

Data Warehouse Cost-Benefit Metadata Data Lake

Tackling AI’s data challenges with IBM databases on AWS

IBM Big Data Hub

MARCH 14, 2024

This involves unifying and sharing a single copy of data and metadata across IBM® watsonx.data ™, IBM® Db2 ®, IBM® Db2® Warehouse and IBM® Netezza ®, using native integrations and supporting open formats, all without the need for migration or recataloging.

Cost-Benefit

Cost-Benefit Metadata Optimization Management

Introducing watsonx: The future of AI for business

IBM Big Data Hub

MAY 9, 2023

Optimized for all data, analytics and AI workloads, watsonx.data combines the flexibility of a data lake with the performance of a data warehouse, helping businesses to scale data analytics and AI anywhere their data resides. Put AI to work in your business with IBM today IBM is infusing watsonx.ai

Data Warehouse

Data Warehouse Machine Learning Cost-Benefit Metadata

Themes and Conferences per Pacoid, Episode 12

Domino Data Lab

AUGUST 8, 2019

I mention this here because there was a lot of overlap between current industry data governance needs and what the scientific community is working toward for scholarly infrastructure. The gist is, leveraging metadata about research datasets, projects, publications, etc., Nothing Spreads Like Fear”.

Data Science

Data Science Machine Learning Data Governance Statistics

How Novo Nordisk built distributed data governance and control at scale

AWS Big Data

APRIL 28, 2023

In this example, the analytics tool accesses the data lake on Amazon Simple Storage Service (Amazon S3) through Athena queries. As the data mesh pattern expands across domains covering more downstream services, we need a mechanism to keep IdPs and IAM role trusts continuously updated.

Data Governance

Data Governance Management Data-driven Data Lake

Build a data lake with Apache Flink on Amazon EMR

AWS Big Data

JANUARY 27, 2023

With a unified data catalog, you can quickly search datasets and figure out data schema, data format, and location. The AWS Glue Data Catalog provides a uniform repository where disparate systems can store and find metadata to keep track of data in data silos. Refer to Catalogs for more information.

Data Lake

Data Lake Metadata Business Analysis Data-driven

PODCAST: Making AI Real – Episode 4: Unlocking the Value of Enterprise AI with Data Engineering Capabilities

bridgei2i

MARCH 3, 2021

In this episode of the AI to Impact Podcast, host Pavan Kumar speaks to Prinkan Pal about the evolution of data engineering and ML-operations from a closed team into a tech consulting unit. My team primarily comprises data engineers, ML-engineers, full-stack developers and solution architects primarily focusing towards cloud.

Enterprise

Enterprise Digital Transformation Data-driven Interactive

How smava makes loans transparent and affordable using Amazon Redshift Serverless

AWS Big Data

DECEMBER 21, 2023

To bring their customers the best deals and user experience, smava follows the modern data architecture principles with a data lake as a scalable, durable data store and purpose-built data stores for analytical processing and data consumption.

Data Lake

Data Lake Data Warehouse Data-driven B2B

Data Governance for Dummies: Your Questions, Answered

Alation

FEBRUARY 17, 2023

This is because you will know what data you can trust, and you will have processes to create and upkeep data, as well as curated metadata to exploit data’s full capabilities. Can you differentiate between governance of raw data and enhanced data (information)? Where do you govern? Here’s an example.

Data Governance

Data Governance Data Quality Metadata Cost-Benefit

The Gartner 2021 Leadership Vision for Data & Analytics Leaders Webinar Q&A

Andrew White

JANUARY 11, 2021

As with any good consulting response, “it depends.” Do you recommend a consulting approach strategy rather than a CDO strategy? Does Data warehouse as a software tool will play role in future of Data & Analytics strategy? Data lakes don’t offer this nor should they. It really does.

Data Analytics

Data Analytics Analytics Data-driven Finance

Data Leaders Brief

Accelerate HiveQL with Oozie to Spark SQL migration on Amazon EMR

What is a data architect? Skills, salaries, and how to become a data framework master

Webinars

Trending Sources

How HR&A uses Amazon Redshift spatial analytics on Amazon Redshift Serverless to measure digital equity in states across the US

Webinars

The Madness of Data (and analytics) Governance

What Is Data Curation?

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

Exploring real-time streaming for generative AI Applications

Data Profiling: What It Is and How to Perfect It

Cloudera Data Warehouse Demonstrates Best-in-Class Cloud-Native Price-Performance

Optimize data layout by bucketing with Amazon Athena and AWS Glue to accelerate downstream queries

Choosing an open table format for your transactional data lake on AWS

What Is Alation Connected Sheets? Q&A with the Creators

Top Graph Use Cases and Enterprise Applications (with Real World Examples)

Habib Bank manages data at scale with Cloudera Data Platform

In-depth with CDO Christopher Bannocks

Five actionable steps to GDPR compliance (Right to be forgotten) with Amazon Redshift

6 BI challenges IT teams must address

Fabrics, Meshes & Stacks, oh my! Q&A with Sanjeev Mohan

Federate Amazon QuickSight access with open-source identity provider Keycloak

Automate legacy ETL conversion to AWS Glue using Cognizant Data and Intelligence Toolkit (CDIT) – ETL Conversion Tool

Tackling AI’s data challenges with IBM databases on AWS

Introducing watsonx: The future of AI for business

Themes and Conferences per Pacoid, Episode 12

How Novo Nordisk built distributed data governance and control at scale

Build a data lake with Apache Flink on Amazon EMR

PODCAST: Making AI Real – Episode 4: Unlocking the Value of Enterprise AI with Data Engineering Capabilities

How smava makes loans transparent and affordable using Amazon Redshift Serverless

Data Governance for Dummies: Your Questions, Answered

The Gartner 2021 Leadership Vision for Data & Analytics Leaders Webinar Q&A

Stay Connected