Blog, Data Analytics and Metadata

RDF-Star: Metadata Complexity Simplified

Ontotext

JUNE 10, 2021

To handle such scenarios you need a transalytical graph database – a database engine that can deal with both frequent updates (OLTP workload) as well as with graph analytics (OLAP). Not Every Graph is a Knowledge Graph: Schemas and Semantic Metadata Matter. Metadata about Relationships Come in Handy. Schemas are powerful.

Metadata

Metadata Cost-Benefit OLAP Modeling

How Cargotec uses metadata replication to enable cross-account data sharing

AWS Big Data

JUNE 7, 2023

This is a guest blog post co-written with Sumesh M R from Cargotec and Tero Karttunen from Knowit Finland. They chose AWS Glue as their preferred data integration tool due to its serverless nature, low maintenance, ability to control compute resources in advance, and scale when needed.

Metadata

Metadata Data Lake Machine Learning Big Data

Ontotext’s Top 5 Most Popular Blog Posts for 2020

Ontotext

DECEMBER 16, 2020

At the end of an unconventional year, we at Ontotext still want to honor our tradition and provide our readers with a round-up of the most popular posts on our blog. In its third generation, Ontotext Platform enables organizations to build, use and evolve knowledge graphs as a hub for data, metadata and content.

Metadata

Metadata Unstructured Data Visualization Enterprise

Webinars

Leading the Development of Profitable and Sustainable Products

How To Get Promoted In Product Management

MORE WEBINARS

Insights from Gartner Data & Analytics Summit Orlando 2023

Alation

MARCH 31, 2023

Ehtisham Zaidi, Gartner’s VP of data management, and Robert Thanaraj, Gartner’s director of data management, gave an update on the fabric versus mesh debate in light of what they call the “active metadata era” we’re currently in. The active metadata helix Indeed, automation was on everyone’s minds. We couldn’t agree more.

Data Analytics

Data Analytics Analytics Metadata Data Governance

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JUNE 10, 2024

In this blog post, we dive into different data aspects and how Cloudinary breaks the two concerns of vendor locking and cost efficient data analytics by using Apache Iceberg, Amazon Simple Storage Service (Amazon S3 ), Amazon Athena , Amazon EMR , and AWS Glue. Old metadata files are kept for history by default.

Data Lake

Data Lake Metadata Snapshot Analytics

A Look Back at the Gartner Data and Analytics Summit

Cloudera

APRIL 18, 2024

More Businesses Are Taking a Holistic Approach to Data Strategy One of the more common trends we saw coming up through conversations during the summit was the need for a reframing of how we approach data strategy—taking a much more holistic viewpoint to it than organizations otherwise would have in past years.

Analytics

Analytics Metadata Data Strategy Optimization

Run Trino queries 2.7 times faster with Amazon EMR 6.15.0

AWS Big Data

MARCH 22, 2024

Benchmark setup In our testing, we used the 3 TB dataset stored in Amazon S3 in compressed Parquet format and metadata for databases and tables is stored in the AWS Glue Data Catalog. This benchmark uses unmodified TPC-DS data schema and table relationships. As shown in this blog post, our TPC-DS benchmark showed a 2.7

Metadata

Metadata Statistics Broadcasting Optimization

Use Apache Iceberg in a data lake to support incremental data processing

AWS Big Data

MARCH 2, 2023

Apache Iceberg is an open table format for very large analytic datasets, which captures metadata information on the state of datasets as they evolve and change over time. Iceberg has become very popular for its support for ACID transactions in data lakes and features like schema and partition evolution, time travel, and rollback.

Data Lake

Data Lake Data Processing Metadata Snapshot

Gartner Data & Analytics London: Human Curation + Machine Learning

Alation

FEBRUARY 13, 2020

By leveraging Google-like smart search to find data assets; using automation and self-learning instead of burdening people with the need to manually update metadata in multiple places; and ensuring that metadata is maintained by the whole data community and is not dependent on a centralized IT team.

Machine Learning

Machine Learning Data Analytics Analytics Metadata

What is a Data Mesh?

DataKitchen

AUGUST 3, 2021

Third-generation – more or less like the previous generation but with streaming data, cloud, machine learning and other (fill-in-the-blank) fancy tools. It’s no fun working in data analytics/science when you are the bottleneck in your company’s business processes. The post What is a Data Mesh? See the pattern?

Data Architecture

Data Architecture Data Lake Cost-Benefit Data Warehouse

Clean up your Excel and CSV files without writing code using AWS Glue DataBrew

AWS Big Data

NOVEMBER 15, 2023

As the organization receives data from multiple external vendors, it often arrives in different formats, typically Excel or CSV files, with each vendor using their own unique data layout and structure. In this blog post, we’ll explore a solution that streamlines this process by leveraging the capabilities of AWS Glue DataBrew.

Metadata

Metadata Sales Data Lake Big Data

Extracting key insights from Amazon S3 access logs with AWS Glue for Ray

AWS Big Data

SEPTEMBER 7, 2023

Customers want to know how their data is being accessed, when it is being accessed, and who is accessing it. With exponential growth in data volume, centralized monitoring becomes challenging. It is also crucial to audit granular data access for security and compliance needs. Big Data Architect. Zach Mitchell is a Sr.

Metadata

Metadata Dashboards Metrics Visualization

Why Establishing Data Context is the Key to Creating Competitive Advantage

Ontotext

AUGUST 22, 2023

The age of Big Data inevitably brought computationally intensive problems to the enterprise. Central to today’s efficient business operations are the activities of data capturing and storage, search, sharing, and data analytics. With semantic metadata, enterprise data gets linked to one another and to external sources.

Metadata

Metadata Knowledge Discovery Big Data Enterprise

Addressing Data Mesh Technical Challenges with DataOps

DataKitchen

AUGUST 9, 2021

The domain also includes code that acts upon the data, including tools, pipelines, and other artifacts that drive analytics execution. The domain requires a team that creates/updates/runs the domain, and we can’t forget metadata: catalogs, lineage, test results, processing history, etc., ….

Testing

Testing Data Lake Metadata Publishing

Prioritizing Data: Why a Solid Data Management Strategy Will Be Critical in 2024

Ontotext

JANUARY 29, 2024

As companies in almost every market segment attempt to continuously enhance and modernize data management practices to drive greater business outcomes, organizations will be watching numerous trends emerge this year. Sometimes, the challenge is that the data itself often raises more questions than it answers.

Strategy

Strategy Management Metadata Data-driven

Alation 2022.3: Alation Anywhere Connecting the Modern Data Stack

Alation

AUGUST 30, 2022

Centralization of metadata. A decade ago, metadata was everywhere. Consequently, useful metadata was unfindable and unusable. We had data but no data intelligence and, as a result, insights remained hidden or hard to come by. This universe of metadata represents a treasure trove of connected information.

Metadata

Metadata Data Quality Data Governance Machine Learning

Introducing Amazon EMR on EKS with Apache Flink: A scalable, reliable, and efficient data processing platform

AWS Big Data

MAY 28, 2024

Integration with the AWS Glue Data Catalog as a metadata store for Flink applications The AWS Glue Data Catalog is a centralized metadata repository for data assets across various data sources, and provides a unified interface to store and query information about data formats, schemas, and sources.

Data Processing

Data Processing Cost-Benefit Metadata Optimization

Unlock data across organizational boundaries using Amazon DataZone – now generally available

AWS Big Data

OCTOBER 4, 2023

Additionally, data owners and data stewards can make data discovery simpler by adding business context to data while balancing access governance to the data via pre-defined approval workflows in the user interface. The metadata forms types, and asset types can be used as templates for defining your assets.

Metadata

Metadata Data Lake Publishing Data Governance

How Huron built an Amazon QuickSight Asset Catalogue with AWS CDK Based Deployment Pipeline

AWS Big Data

APRIL 26, 2023

This is a guest blog post co-written with Corey Johnson from Huron. Having an accurate and up-to-date inventory of all technical assets helps an organization ensure it can keep track of all its resources with metadata information such as their assigned oners, last updated date, used by whom, how frequently and more.

Metadata

Metadata Dashboards Visualization Consulting

DataOps Facilitates Remote Work

DataKitchen

JANUARY 5, 2021

Data Science Workflow – Kubeflow, Python, R. Data Engineering Workflow – Airflow, ETL. Data Visualization, Preparation – Self-service tools sucha as Tableau, Alteryx. Data Governance/Catalog (Metadata management) Workflow – Alation, Collibra, Wikis.

Testing

Testing Data Governance Metadata Visualization

Why Data Governance Is Crucial for All Enterprise-Level Businesses

Cloudera

MARCH 3, 2022

Data users in these enterprises don’t know how data is derived and lack confidence in whether it’s the right source to use. . If data access policies and lineage aren’t consistent across an organization’s private cloud and public clouds, gaps will exist in audit logs. From Bad to Worse.

Data Governance

Data Governance Enterprise Data Quality Metadata

Build Spark Structured Streaming applications with the open source connector for Amazon Kinesis Data Streams

AWS Big Data

MAY 24, 2024

Apache Spark is a powerful big data engine used for large-scale data analytics. You can use Apache Spark to process streaming data from a variety of streaming sources, including Amazon Kinesis Data Streams for use cases like clickstream analysis, fraud detection, and more.

Metadata

Metadata Interactive Business Objectives Management

Surviving Radical Disruption with Data Intelligence

erwin

OCTOBER 16, 2020

And this time sensitivity is a massive issue, as taking a proactive and data-driven approach can literally mean life or death to your business or to your customers. And that’s where data analytics can play a huge role. There’s a common denominator in what they’re all missing, and that is data intelligence.

Internet of Things

Internet of Things Data-driven Uncertainty Data Governance

Process and analyze highly nested and large XML files using AWS Glue and Amazon Athena

AWS Big Data

SEPTEMBER 29, 2023

To analyze XML files stored in Amazon S3 using AWS Glue and Athena, we complete the following high-level steps: Create an AWS Glue crawler to extract XML metadata and create a table in the AWS Glue Data Catalog. We use the AWS Glue crawler to extract XML file metadata. Choose Add a data store. Choose Create.

Metadata

Metadata Visualization Data-driven Optimization

Five benefits of a data catalog

IBM Big Data Hub

DECEMBER 16, 2022

An enterprise data catalog does all that a library inventory system does – namely streamlining data discovery and access across data sources – and a lot more. For example, data catalogs have evolved to deliver governance capabilities like managing data quality and data privacy and compliance.

Metadata

Metadata Data Quality Data-driven Data Governance

AI at Scale isn’t Magic, it’s Data – Hybrid Data

Cloudera

OCTOBER 11, 2022

Al needs machine learning (ML), ML needs data science. Data science needs analytics. And they all need lots of data. Different data types need different types of analytics – real-time, streaming, operational, data warehouses. And that data is likely in clouds, in data centers and at the edge.

Snapshot

Snapshot Data Science Digital Transformation Metadata

What’s the Current State of Data Governance and Automation?

erwin

JANUARY 30, 2020

The results of our new research show that organizations are still trying to master data governance, including adjusting their strategies to address changing priorities and overcoming challenges related to data discovery, preparation, quality and traceability. And close to 50 percent have deployed data catalogs and business glossaries.

Data Governance

Data Governance Metadata Cost-Benefit Digital Transformation

The Future Is Hybrid Data, Embrace It

Cloudera

JUNE 7, 2022

Want to manage and analyze data of all types including machine, structured, transactional, and unstructured – anywhere? Only Cloudera has the power to span multi-cloud and on-premises with a hybrid data platform. Common security, governance, metadata, replication, and automation enable CDP to operate as an integrated system.

IT

IT Data Architecture Unstructured Data Big Data

Apache Ozone – A High Performance Object Store for CDP Private Cloud

Cloudera

OCTOBER 15, 2021

With FSO, Apache Ozone guarantees atomic directory operations, and renaming or deleting a directory is a simple metadata operation even if the directory has a large set of sub-paths (directories/files) within it. In fact, this gives Apache Ozone a significant performance advantage over other object stores in the data analytics ecosystem.

Testing

Testing Measurement Optimization Metadata

From Disparate Data to Visualized Knowledge Part I: Moving from Spreadsheets to an RDF Database

Ontotext

NOVEMBER 18, 2021

Picture this – you start with the perfect use case for your data analytics product. And all of them are asking hard questions: “Can you integrate my data, with my particular format?”, “How well can you scale?”, “How many visualizations do you offer?”. Nowadays, data analytics doesn’t exist on its own.

Visualization

Visualization Reporting Metadata Enterprise

Webinar Summary: Data Mesh and Data Products

DataKitchen

MAY 4, 2023

Chris talks about the idea of a ‘domain’ as a principle of Data Mesh. A domain is a unit that includes integrated or raw data, artifacts created from data, the code that acts upon the data, the team responsible for the data, and metadata such as data catalog, lineage, and processing history.

Measurement

Measurement Data-driven Testing Cost-Benefit

Data architecture strategy for data quality

IBM Big Data Hub

JANUARY 5, 2023

Monitor and identify data quality issues closer to the source to mitigate the potential impact on downstream processes or workloads. Efficiently adopt data platforms and new technologies for effective data management. Apply metadata to contextualize existing and new data to make it searchable and discoverable.

Data Quality

Data Quality Data Architecture Strategy Data Lake

Multicloud data lake analytics with Amazon Athena

AWS Big Data

MARCH 18, 2024

In these cases, you may want an integrated query layer to seamlessly run analytical queries across these diverse cloud stores and streamline your data analytics processes. The AWS Glue Data Catalog holds the metadata for Amazon S3 and GCS data.

Data Lake

Data Lake Analytics Cost-Benefit Management

Setting up and Getting Started with Cloudera’s New SQL AI Assistant

Cloudera

JANUARY 19, 2024

As described in our recent blog post , an SQL AI Assistant has been integrated into Hue with the capability to leverage the power of large language models (LLMs) for a number of SQL tasks. This is a real game-changer for data analysts on all levels and will make SQL development faster, easier, and less error-prone.

Data Warehouse

Data Warehouse Data Processing Optimization Modeling

Overcome these six data consumption challenges for a more data-driven enterprise

IBM Big Data Hub

JUNE 8, 2022

Lack of a common business vocabulary across your organization’s data and the inability to map those categories to existing data leads to inconsistency of business metrics and data analytics in addition to making it difficult for users to easily find and understand the data.

Data-driven

Data-driven Enterprise Data Governance Data Lake

Three Takeaways from Gartner’s 2019 Magic Quadrant for Data Management Solutions for Analytics

Cloudera

FEBRUARY 11, 2019

The January 2019 “Magic Quadrant for Data Management Solutions for Analytics” provides valuable insights into the status, direction, and players in the DMSA market. In this blog, we share our takeaways as they relate to the DMSA market trends. Cloudera’s Positioning.

Management

Management Metadata Analytics Machine Learning

Benchmark Results Position GraphDB As the Most Versatile Graph Database Engine

Ontotext

FEBRUARY 23, 2023

The engines must facilitate the advanced data integration and metadata data management scenarios where an EKG is used for data fabrics or otherwise serves as a data hub between diverse data and content management systems.

Publishing

Publishing Metadata Optimization Testing

The Madness of Data (and analytics) Governance

Andrew White

DECEMBER 9, 2019

The client had recently engaged with a well-known consulting company that had recommended a large data catalog effort to collect all enterprise metadata to help identify all data and business issues. Modern data (and analytics) governance does not necessarily need: Wall-to-wall discovery of your data and metadata.

Analytics

Analytics Data Lake Data Governance Metadata

Securely process near-real-time data from Amazon MSK Serverless using an AWS Glue streaming ETL job with IAM authentication

AWS Big Data

SEPTEMBER 13, 2023

Streaming data has become an indispensable resource for organizations worldwide because it offers real-time insights that are crucial for data analytics. The escalating velocity and magnitude of collected data has created a demand for real-time analytics. This table acts as a metadata layer for the data.

Data Processing

Data Processing Management Interactive Metadata

GraphDB in Action: Putting the Most Reliable RDF Database to Work for Better Human-machine Interaction

Ontotext

JANUARY 26, 2023

The catalog stores the asset’s metadata in RDF. This allows keeping a well-defined representation of the metadata of each asset and enables using a SPARQL endpoint to query it. Towards that end authors introduce a system for integrity checks for building automation applications and using more reliable data for data analytics processes.

Interactive

Interactive Metadata Data Integration Data-driven

The Future of the Data Lakehouse – Open

Cloudera

JUNE 18, 2022

Cloudera customers run some of the biggest data lakes on earth. These lakes power mission critical large scale data analytics, business intelligence (BI), and machine learning use cases, including enterprise data warehouses. The post The Future of the Data Lakehouse – Open appeared first on Cloudera Blog.

Data Lake

Data Lake Data Warehouse Machine Learning Cost-Benefit

2023 Predictions: Data Trends That Will Dominate Business Agenda in APAC

Cloudera

JANUARY 5, 2023

We have been working with APAC organizations to operationalize data analytics and AI solutions to unlock data-driven decision-making and operational efficiency, with them quickly seeing distinct business benefits. These features provide businesses with a common metadata, security, and governance model across all their data.

Cost-Benefit

Cost-Benefit Business Objectives Machine Learning Data Architecture

AI in Analytics: The NLQ Use Case

Sisense

JULY 24, 2019

In my previous blog , I wrote about Natural Language Query (NLQ, or search analytics for some), as one of the major topics that we, the AI group in Sisense, are working on. NLQ is one of the oldest AI disciplines, but we’ve only recently started hearing about it in conjunction with BI and analytics. Using AI to its Fullest.

Analytics

Analytics Experimentation Metadata Big Data

Strategically Approaching Graph Technologies

Ontotext

FEBRUARY 26, 2024

RDF tends to be more data and metadata -centric. There’s an additional effort around controlled vocabularies, using IRIs to resolve entities, leveraging Linked Open Data to enrich metadata, and managing complex ontologies and taxonomies. You also have controlled vocabularies and better data hygiene with data validation.

Technology

Technology Cost-Benefit Data-driven Metadata

RDF-Star: Metadata Complexity Simplified

How Cargotec uses metadata replication to enable cross-account data sharing

Webinars

Trending Sources

Ontotext’s Top 5 Most Popular Blog Posts for 2020

Webinars

Insights from Gartner Data & Analytics Summit Orlando 2023

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

A Look Back at the Gartner Data and Analytics Summit

Run Trino queries 2.7 times faster with Amazon EMR 6.15.0

Use Apache Iceberg in a data lake to support incremental data processing

Gartner Data & Analytics London: Human Curation + Machine Learning

What is a Data Mesh?

Clean up your Excel and CSV files without writing code using AWS Glue DataBrew

Extracting key insights from Amazon S3 access logs with AWS Glue for Ray

Why Establishing Data Context is the Key to Creating Competitive Advantage

Addressing Data Mesh Technical Challenges with DataOps

Prioritizing Data: Why a Solid Data Management Strategy Will Be Critical in 2024

Alation 2022.3: Alation Anywhere Connecting the Modern Data Stack

Introducing Amazon EMR on EKS with Apache Flink: A scalable, reliable, and efficient data processing platform

Unlock data across organizational boundaries using Amazon DataZone – now generally available

How Huron built an Amazon QuickSight Asset Catalogue with AWS CDK Based Deployment Pipeline

DataOps Facilitates Remote Work

Why Data Governance Is Crucial for All Enterprise-Level Businesses

Build Spark Structured Streaming applications with the open source connector for Amazon Kinesis Data Streams

Surviving Radical Disruption with Data Intelligence

Process and analyze highly nested and large XML files using AWS Glue and Amazon Athena

Five benefits of a data catalog

AI at Scale isn’t Magic, it’s Data – Hybrid Data

What’s the Current State of Data Governance and Automation?

The Future Is Hybrid Data, Embrace It

Apache Ozone – A High Performance Object Store for CDP Private Cloud

From Disparate Data to Visualized Knowledge Part I: Moving from Spreadsheets to an RDF Database

Webinar Summary: Data Mesh and Data Products

Data architecture strategy for data quality

Multicloud data lake analytics with Amazon Athena

Setting up and Getting Started with Cloudera’s New SQL AI Assistant

Overcome these six data consumption challenges for a more data-driven enterprise

Three Takeaways from Gartner’s 2019 Magic Quadrant for Data Management Solutions for Analytics

Benchmark Results Position GraphDB As the Most Versatile Graph Database Engine

The Madness of Data (and analytics) Governance

Securely process near-real-time data from Amazon MSK Serverless using an AWS Glue streaming ETL job with IAM authentication

GraphDB in Action: Putting the Most Reliable RDF Database to Work for Better Human-machine Interaction

The Future of the Data Lakehouse – Open

2023 Predictions: Data Trends That Will Dominate Business Agenda in APAC

AI in Analytics: The NLQ Use Case

Strategically Approaching Graph Technologies

Stay Connected