Blog, Data Lake and Unstructured Data

Blog

Data Lake

Unstructured Data

Migrate an existing data lake to a transactional data lake using Apache Iceberg

AWS Big Data

OCTOBER 3, 2023

A data lake is a centralized repository that you can use to store all your structured and unstructured data at any scale. You can store your data as-is, without having to first structure the data and then run different types of analytics for better business insights. Choose Next to create your stack.

Data Lake

Data Lake Metadata Snapshot Recreation/Entertainment

Differentiating Between Data Lakes and Data Warehouses

Smart Data Collective

SEPTEMBER 23, 2020

While there is a lot of discussion about the merits of data warehouses, not enough discussion centers around data lakes. We talked about enterprise data warehouses in the past, so let’s contrast them with data lakes. Both data warehouses and data lakes are used when storing big data.

Data Lake

Data Lake Data Warehouse Unstructured Data Big Data

Join 52,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Manufacturing Sustainability Surge: Your Guide to Data-Driven Energy Optimization & Decarbonization

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

MORE WEBINARS

Trending Sources

Use Apache Iceberg in a data lake to support incremental data processing

AWS Big Data

MARCH 2, 2023

Iceberg has become very popular for its support for ACID transactions in data lakes and features like schema and partition evolution, time travel, and rollback. and later supports the Apache Iceberg framework for data lakes. AWS Glue 3.0 The following diagram illustrates the solution architecture.

Data Lake

Data Lake Data Processing Metadata Snapshot

Webinars

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Manufacturing Sustainability Surge: Your Guide to Data-Driven Energy Optimization & Decarbonization

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

MORE WEBINARS

Data Lakes on Cloud & it’s Usage in Healthcare

BizAcuity

MARCH 29, 2019

Data lakes are centralized repositories that can store all structured and unstructured data at any desired scale. The power of the data lake lies in the fact that it often is a cost-effective way to store data. Deploying Data Lakes in the cloud. Best practices to build a Data Lake.

Data Lake

Data Lake Unstructured Data Cost-Benefit Data Quality

Data Modeling 301 for the cloud: data lake and NoSQL data modeling and design

erwin

AUGUST 15, 2022

This blog is based upon a recent webcast that can be viewed here. For NoSQL, data lakes, and data lake houses—data modeling of both structured and unstructured data is somewhat novel and thorny. As with the part 1 and part 2 of this data modeling blog series, the cloud is not nirvana.

Data Lake

Data Lake Modeling Unstructured Data Data Warehouse

Understanding Structured and Unstructured Data

Sisense

APRIL 26, 2020

We live in a world of data: there’s more of it than ever before, in a ceaselessly expanding array of forms and locations. Dealing with Data is your window into the ways Data Teams are tackling the challenges of this new world to help their companies and their customers thrive. Structured vs unstructured data.

Unstructured Data

Unstructured Data Data Warehouse Structured Data Data mining

Create an Apache Hudi-based near-real-time transactional data lake using AWS DMS, Amazon Kinesis, AWS Glue streaming ETL, and data visualization using Amazon QuickSight

AWS Big Data

AUGUST 3, 2023

With the rapid growth of technology, more and more data volume is coming in many different formats—structured, semi-structured, and unstructured. Data analytics on operational data at near-real time is becoming a common need. Then we can query the data with Amazon Athena visualize it in Amazon QuickSight.

Data Lake

Data Lake Visualization Dashboards Insurance

Educating ChatGPT on Data Lakehouse

Cloudera

MARCH 17, 2023

I took the free version of ChatGPT on a test drive (in March 2023) and asked some simple questions on data lakehouse and its components. Hopefully this blog will give ChatGPT an opportunity to learn and correct itself while counting towards my 2023 contribution to social good. I thought this was a fairly comprehensive list.

Unstructured Data

Unstructured Data Data Lake Data Warehouse Machine Learning

Migrate data from Azure Blob Storage to Amazon S3 using AWS Glue

AWS Big Data

OCTOBER 20, 2023

Today, we are pleased to announce new AWS Glue connectors for Azure Blob Storage and Azure Data Lake Storage that allow you to move data bi-directionally between Azure Blob Storage, Azure Data Lake Storage, and Amazon Simple Storage Service (Amazon S3). option("header","true").load("wasbs://yourblob@youraccountname.blob.core.windows.net/loadingtest-input/100mb")

Data Lake

Data Lake Big Data Consulting Data Warehouse

Databricks’ new data lakehouse aims at media, entertainment sector

CIO Business Intelligence

APRIL 25, 2022

Now generally available, the M&E data lakehouse comes with industry use-case specific features that the company calls accelerators, including real-time personalization, said Steve Sobel, the company’s global head of communications, in a blog post. Features focus on media and entertainment firms.

Recreation/Entertainment

Recreation/Entertainment Data Lake Data Warehouse Unstructured Data

Migrate Hive data from CDH to CDP public cloud

Cloudera

JUNE 25, 2021

Using easy-to-define policies, Replication Manager solves one of the biggest barriers for the customers in their cloud adoption journey by allowing them to move both tables/structured data and files/unstructured data to the CDP cloud of their choice easily. CDP Data Lake cluster versions – CM 7.4.0,

Data Lake

Data Lake Metadata Unstructured Data Management

TDC Digital leverages IBM Cloud for transparent billing and improved customer satisfaction

IBM Big Data Hub

MAY 19, 2023

In addition, to address the data loss issue, PT Aegis suggested replication and backups to IBM Cloud Object Storage , a highly scalable and secure cloud storage service that provides a flexible and cost-effective way to store and manage large amounts of unstructured data.

Unstructured Data

Unstructured Data Data Processing Manufacturing Data Lake

2020 Data Impact Award Winner Spotlight: Merck KGaA

Cloudera

DECEMBER 11, 2020

As mentioned in my previous blog on the topic , the recent shift to remote working has seen an increase in conversations around how data is managed. Without meeting GxP compliance, the Merck KGaA team could not run the enterprise data lake needed to store, curate, or process the data required to inform business decisions.

Data Lake

Data Lake Cost-Benefit Unstructured Data Data Governance

Data science vs data analytics: Unpacking the differences

IBM Big Data Hub

SEPTEMBER 19, 2023

Data science is an area of expertise that combines many disciplines such as mathematics, computer science, software engineering and statistics. It focuses on data collection and management of large-scale structured and unstructured data for various academic and business applications.

Data Science

Data Science Data Analytics Prescriptive Analytics Analytics

Data architecture strategy for data quality

IBM Big Data Hub

JANUARY 5, 2023

The first generation of data architectures represented by enterprise data warehouse and business intelligence platforms were characterized by thousands of ETL jobs, tables, and reports that only a small group of specialized data engineers understood, resulting in an under-realized positive impact on the business.

Data Quality

Data Quality Data Architecture Strategy Data Lake

5 misconceptions about cloud data warehouses

IBM Big Data Hub

FEBRUARY 2, 2023

Organizations looking to increase adoption of ML are turning to cloud data warehouses that support new, open data formats to catalog, ingest, and query unstructured data types. Additionally, some DBAs worry that moving to the cloud reduces the need for their expertise and skillset.

Data Warehouse

Data Warehouse Cost-Benefit Unstructured Data Data Architecture

The Madness of Data (and analytics) Governance

Andrew White

DECEMBER 9, 2019

And don’t start with a focus on domain specific data. See: Webinar Effective Data and Analytics Governance – Finally! Blog A Little Data Governance Goes a Long Way. I spoke with an IT software vendor about an aspect of data and analytics governance. Scope could be: Data (i.e. Images (i.e.

Analytics

Analytics Data Lake Data Governance Metadata

Demystifying Modern Data Platforms

Cloudera

SEPTEMBER 15, 2022

Mark: While most discussions of modern data platforms focus on comparing the key components, it is important to understand how they all fit together. The collection of source data shown on your left is composed of both structured and unstructured data from the organization’s internal and external sources.

Data Lake

Data Lake Data Architecture Data-driven Data Warehouse

Expediting SQL Workers means Expediting your Business

Cloudera

NOVEMBER 10, 2020

We have evolved with our users, from early-on Hadoop hackers needing quick access to data in the Data Lake, to a much more sophisticated SQL tool. It, therefore, makes sense to provide a seamless transition from the context of HUE to Cloudera’s new, built-in Data Visualization tool.

Visualization

Visualization Optimization Unstructured Data Dashboards

Quantitative and Qualitative Data: A Vital Combination

Sisense

OCTOBER 6, 2020

Digging into quantitative data Why is quantitative data important What are the problems with quantitative data Exploring qualitative data Qualitative data benefits Getting the most from qualitative data Better together. Qualitative data benefits: Unlocking understanding.

Statistics

Statistics Unstructured Data Data-driven Visualization

Unlocking Data Storage: The Traditional Data Warehouse vs. Cloud Data Warehouse

Sisense

NOVEMBER 12, 2020

Data warehouse vs. databases Traditional vs. Cloud Explained Cloud data warehouses in your data stack A data-driven future powered by the cloud. We live in a world of data: There’s more of it than ever before, in a ceaselessly expanding array of forms and locations.

Data Warehouse

Data Warehouse Data Lake OLAP Data-driven

Migrate data from Google Cloud Storage to Amazon S3 using AWS Glue

AWS Big Data

JULY 19, 2023

We’ve seen that there is a demand to design applications that enable data to be portable across cloud environments and give you the ability to derive insights from one or more data sources. With this connector, you can bring the data from Google Cloud Storage to Amazon S3.

Big Data

Big Data Software Consulting Unstructured Data

Five benefits of a data catalog

IBM Big Data Hub

DECEMBER 16, 2022

For example, data catalogs have evolved to deliver governance capabilities like managing data quality and data privacy and compliance. It uses metadata and data management tools to organize all data assets within your organization. Everybody wins with a data catalog.

Metadata

Metadata Data Quality Data-driven Data Governance

Architectural patterns for real-time analytics using Amazon Kinesis Data Streams, part 1

AWS Big Data

JANUARY 8, 2024

Therefore, it is critical for organizations to embrace a low-latency, scalable, and reliable data streaming infrastructure to deliver real-time business applications and better customer experiences. Stream ingestion – The stream ingestion layer is responsible for ingesting data into the stream storage layer.

Analytics

Analytics IoT Data-driven Snapshot

The year’s top 10 enterprise AI trends — so far

CIO Business Intelligence

SEPTEMBER 21, 2023

Enterprises still aren’t extracting enough value from unstructured data hidden away in documents, though, says Nick Kramer, VP for applied solutions at management consultancy SSA & Company. Data warehouses then evolved into data lakes, and then data fabrics and other enterprise-wide data architectures.

Enterprise

Enterprise Consulting Modeling Cost-Benefit

The Differences Between Data Warehouses and Data Lakes

Sisense

APRIL 9, 2021

The amount of data being generated and stored every day has exploded. Companies of all kinds are sitting on stockpiles of data that could someday prove valuable. Until then though, they don’t necessarily want to spend the time and resources necessary to create a schema to house this data in a traditional data warehouse.

Data Lake

Data Lake Data Warehouse Unstructured Data Structured Data

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. In this post, we describe Orca’s journey building a transactional data lake using Amazon Simple Storage Service (Amazon S3), Apache Iceberg, and AWS Analytics.

Data Lake

Data Lake Analytics Snapshot Optimization

Streaming Edge Data Collection and Global Data Distribution

Cloudera

JUNE 9, 2022

In the first blog of the Universal Data Distribution blog series , we discussed the emerging need within enterprise organizations to take control of their data flows. controlling distribution while also allowing the freedom and flexibility to deliver the data to different services is more critical than ever. .

Data Collection

Data Collection IoT Data Lake Unstructured Data

What is an open data lakehouse and why you should care?

IBM Big Data Hub

JANUARY 17, 2023

A data lakehouse is an emerging data management architecture that improves efficiency and converges data warehouse and data lake capabilities driven by a need to improve efficiency and obtain critical insights faster. Let’s start with why data lakehouses are becoming increasingly important.

Data Lake

Data Lake Metadata Data Warehouse Data Governance

Habib Bank manages data at scale with Cloudera Data Platform

Cloudera

NOVEMBER 17, 2022

The Solution: CDP Private Cloud brings a next-generation hybrid architecture with cloud-native benefits to HBL’s data platform. HBL started their data journey in 2019 when data lake initiative was started to consolidate complex data sources and enable the bank to use single version of truth for decision making.

Management

Management Data Lake Consulting Unstructured Data

How Data Management and Big Data Analytics Speed Up Business Growth

BizAcuity

APRIL 14, 2022

The best way to avoid poor data quality is having a strict data governance system in place. The majority of the data a business has stored is generally unstructured. Most of these are accumulated in data silos or data lakes. Which means queries for large data sets might take days or eventually fail.

Big Data

Big Data Data Analytics Management Unstructured Data

How The Cloud Made ‘Data-Driven Culture’ Possible | Part 1

BizAcuity

MAY 10, 2022

And how this transformation will impact businesses in the short and long run is the main discussion in this blog. Google launches BigQuery, its own data warehousing tool and Microsoft introduces Azure SQL Data Warehouse and Azure Data Lake Store. Follow us on LinkedIn and stay updated on new blogs coming your way.

Data-driven

Data-driven IoT Unstructured Data Data Lake

Why Modernizing the First Mile of the Data Pipeline Can Accelerate all Analytics

Cloudera

AUGUST 13, 2021

Every enterprise is trying to collect and analyze data to get better insights into their business. Whether it is consuming log files, sensor metrics, and other unstructured data, most enterprises manage and deliver data to the data lake and leverage various applications like ETL tools, search engines, and databases for analysis.

Analytics

Analytics Data Lake Unstructured Data Enterprise

How foundation models and data stores unlock the business potential of generative AI

IBM Big Data Hub

AUGUST 1, 2023

models are trained on IBM’s curated, enterprise-focused data lake. Fortunately, data stores serve as secure data repositories and enable foundation models to scale in both terms of their size and their training data. Foundation models focused on enterprise value IBM’s watsonx.ai All watsonx.ai

Modeling

Modeling Cost-Benefit Data Lake Machine Learning

Chose Both: Data Fabric and Data Lakehouse

Cloudera

SEPTEMBER 12, 2022

Organizations don’t know what they have anymore and so can’t fully capitalize on it — the majority of data generated goes unused in decision making. And second, for the data that is used, 80% is semi- or unstructured. Both obstacles can be overcome using modern data architectures, specifically data fabric and data lakehouse.

Unstructured Data

Unstructured Data Data Architecture Data Lake Snapshot

Celebrating Data Superheroes: The 2021 Data Impact Awards Winners

Cloudera

NOVEMBER 18, 2021

By adopting a custom developed application based on the Cloudera ecosystem, Carrefour has combined the legacy systems into one platform which provides access to customer data in a single data lake. EVA unifies data from MTN’s different operator systems, creating a 360° view of subscribers.

Data Lake

Data Lake Cost-Benefit Digital Transformation Risk

Why Spreadsheets Are Your Secret Weapon for Efficient Data Governance

Alation

APRIL 6, 2023

Data governance is traditionally applied to structured data assets that are most often found in databases and information systems. This blog focuses on governing spreadsheets that contain data, information, and metadata, and must themselves be governed.

Data Governance

Data Governance Metadata Cost-Benefit Structured Data

The Modern Data Lakehouse: An Architectural Innovation

Cloudera

SEPTEMBER 9, 2022

Imagine quickly answering burning business questions nearly instantly, without waiting for data to be found, shared, and ingested. Imagine independently discovering rich new business insights from both structured and unstructured data working together, without having to beg for data sets to be made available.

Metadata

Metadata Machine Learning Unstructured Data Data Lake

Data Visualization and Visual Analytics: Seeing the World of Data

Sisense

JUNE 30, 2020

Everyone wants to get more out of their data, but how exactly to do that can leave you scratching your head. In a world increasingly dominated by data, users of all kinds are gathering, managing, visualizing, and analyzing data in a wide variety of ways.

Visualization

Visualization Analytics Dashboards Data-driven

Five Strategies to Accelerate Data Product Development

Cloudera

JULY 26, 2021

A common pitfall in the development of data platforms is that they are built around the boundaries of point solutions and are constrained by the technological limitations (e.g., a technology choice such as Spark Streaming is overly focused on throughput at the expense of latency) or data formats (e.g., data warehousing).

Strategy

Strategy Data Science Marketing Unstructured Data

Cross-Functional Trade Surveillance

Cloudera

MAY 16, 2018

This example combines three types of unrelated data: Legal entity data: Two companies with completely unrelated business lines (coffee and waste management) merged together; Unstructured data: Fraudulent promotion campaigns took place through press releases and a fake stock-picking robot.

Data Lake

Data Lake Risk Visualization Unstructured Data

The Data Journey: From Raw Data to Insights

Sisense

JULY 22, 2020

We live in a world of data: there’s more of it than ever before, in a ceaselessly expanding array of forms and locations. Dealing with Data is your window into the ways organizations tackle the challenges of this new world to help their companies and their customers thrive. Data modeling: Create relationships between data.

Slice and Dice

Slice and Dice Digital Transformation Data Warehouse Data Lake

Turning petabytes of pharmaceutical data into actionable insights

Cloudera

JUNE 4, 2018

That’s the equivalent of 1 petabyte ( ComputerWeekly ) – the amount of unstructured data available within our large pharmaceutical client’s business. Then imagine the insights that are locked in that massive amount of data. Nguyen, Accenture & Mitch Gomulinski, Cloudera.

Unstructured Data

Unstructured Data Metadata Big Data Enterprise

Doing a 180 on Customer 360 – The Preferred Path to Customer Insights

Cloudera

OCTOBER 30, 2018

In this blog post, Sheryl outlines how next-gen CIP applications are delivering a better customer experience, and why businesses are relying on CIPs as their preferred path to customer insights. IT departments previously invested in MDM and data warehousing technologies to consolidate information associated with customer profiles.

Unstructured Data

Unstructured Data Data Lake Machine Learning Interactive

Migrate an existing data lake to a transactional data lake using Apache Iceberg

Differentiating Between Data Lakes and Data Warehouses

Webinars

Trending Sources

Use Apache Iceberg in a data lake to support incremental data processing

Webinars

Data Lakes on Cloud & it’s Usage in Healthcare

Data Modeling 301 for the cloud: data lake and NoSQL data modeling and design

Understanding Structured and Unstructured Data

Create an Apache Hudi-based near-real-time transactional data lake using AWS DMS, Amazon Kinesis, AWS Glue streaming ETL, and data visualization using Amazon QuickSight

Educating ChatGPT on Data Lakehouse

Migrate data from Azure Blob Storage to Amazon S3 using AWS Glue

Databricks’ new data lakehouse aims at media, entertainment sector

Migrate Hive data from CDH to CDP public cloud

TDC Digital leverages IBM Cloud for transparent billing and improved customer satisfaction

2020 Data Impact Award Winner Spotlight: Merck KGaA

Data science vs data analytics: Unpacking the differences

Data architecture strategy for data quality

5 misconceptions about cloud data warehouses

The Madness of Data (and analytics) Governance

Demystifying Modern Data Platforms

Expediting SQL Workers means Expediting your Business

Quantitative and Qualitative Data: A Vital Combination

Unlocking Data Storage: The Traditional Data Warehouse vs. Cloud Data Warehouse

Migrate data from Google Cloud Storage to Amazon S3 using AWS Glue

Five benefits of a data catalog

Architectural patterns for real-time analytics using Amazon Kinesis Data Streams, part 1

The year’s top 10 enterprise AI trends — so far

The Differences Between Data Warehouses and Data Lakes

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Streaming Edge Data Collection and Global Data Distribution

What is an open data lakehouse and why you should care?

Habib Bank manages data at scale with Cloudera Data Platform

How Data Management and Big Data Analytics Speed Up Business Growth

How The Cloud Made ‘Data-Driven Culture’ Possible | Part 1

Why Modernizing the First Mile of the Data Pipeline Can Accelerate all Analytics

How foundation models and data stores unlock the business potential of generative AI

Chose Both: Data Fabric and Data Lakehouse

Celebrating Data Superheroes: The 2021 Data Impact Awards Winners

Why Spreadsheets Are Your Secret Weapon for Efficient Data Governance

The Modern Data Lakehouse: An Architectural Innovation

Data Visualization and Visual Analytics: Seeing the World of Data

Five Strategies to Accelerate Data Product Development

Cross-Functional Trade Surveillance

The Data Journey: From Raw Data to Insights

Turning petabytes of pharmaceutical data into actionable insights

Doing a 180 on Customer 360 – The Preferred Path to Customer Insights

Stay Connected