Diving Deeper into the Data Lake

David Menninger's Analyst Perspectives

A data lake is a centralized repository designed to house big data in structured, semi-structured and unstructured form. Our data lake research has uncovered some points to consider in your efforts, and I’d like to offer a deeper dive into our findings.

Data Lakes Meet Data Warehouses

David Menninger's Analyst Perspectives

In this analyst perspective, Dave Menninger takes a look at data lakes. He explains the term “data lake,” describes common use cases and shares his views on some of the latest market trends. He explores the relationship between data warehouses and data lakes and share some of Ventana Research’s findings on the subject.


Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

What are the differences between Data Lake and Data Warehouse?

Analytics Vidhya

Overview Understand the meaning of data lake and data warehouse We will see what are the key differences between Data Warehouse and Data Lake. The post What are the differences between Data Lake and Data Warehouse?

Differentiating Between Data Lakes and Data Warehouses

Smart Data Collective

The market for data warehouses is booming. While there is a lot of discussion about the merits of data warehouses, not enough discussion centers around data lakes. We talked about enterprise data warehouses in the past, so let’s contrast them with data lakes.

Data Analytics in the Cloud for Developers and Founders

Speaker: Javier Ramírez, Senior AWS Developer Advocate, AWS

You have lots of data, and you are probably thinking of using the cloud to analyze it. But how will you move data into the cloud? In which format? How will you validate and prepare the data? What about streaming data? Can data scientists discover and use the data? Can business people create reports via drag and drop? Can operations monitor what’s going on? Will the data lake scale when you have twice as much data? Is your data secure? In this session, we address common pitfalls of building data lakes and show how AWS can help you manage data and analytics more efficiently.

Schema Evolution in Data Lakes


Whereas a data warehouse will need rigid data modeling and definitions, a data lake can store different types and shapes of data. In a data lake, the schema of the data can be inferred when it’s read, providing the aforementioned flexibility. 2020 Jan Opinions Data Lakes EvolutionHowever, this flexibility is a double-edged sword.

The Data Lake is Dead; Long Live the Data Lake!


Martin Wilcox examines the failure of data lakes

Cloudera Consolidates Its Data Platform

David Menninger's Analyst Perspectives

Organizations are dealing with exponentially increasing data that ranges broadly from customer-generated information, financial transactions, edge-generated data and even operational IT server logs.

IT 158

Data Lakes vs. Data Warehouses


Understand the differences between the two most popular options for storing big data

Unlocking the Potential of Machine Learning in a Data Lake

Data Virtualization

With data becoming the brain food to the intelligence of every organization, regardless of size or sector, it has become crucial to harness this data to achieve the best results, make the most informed decisions and improve productivity. Technology artificial intelligence big data Data integration Data Lake data virtualization Logical Data Lake Machine learning

What's the difference between data lakes and data warehouses?

IBM Big Data Hub

If you’ve heard the debate among IT professionals about data lakes versus data warehouses, you might be wondering which is better for your organization. You might even be wondering how these two approaches are different at all

7 Key Benefits of Proper Data Lake Ingestion

Smart Data Collective

It’s impossible to deny the importance of data in several industries, but that data can get overwhelming if it isn’t properly managed. The problem is that managing and extracting valuable insights from all this data needs exceptional data collecting, which makes data ingestion vital. Perhaps one of the biggest perks is scalability, which simply means that with good data lake ingestion a small business can begin to handle bigger data numbers.

Data Lakes on Cloud & it’s Usage in Healthcare


Data lakes are centralized repositories that can store all structured and unstructured data at any desired scale. Data can be stored as-is, without first structuring it, and different types of analytics can be run on it, from dashboards and visualizations to big data processing, real-time analytics, and machine learning to improve decision making. The power of the data lake lies in the fact that it often is a cost-effective way to store data.

Deriving Value from Data Lakes with AI


Artificial Intelligence and machine learning are the future of every industry, especially data and analytics. Data is growing at a phenomenal rate and that’s not going to stop anytime soon. AI and ML are the only ways to derive value from massive data lakes, cloud-native data warehouses, and other huge stores of information. Once your data is prepared for analysis, the next question is: how else can AI help you? Get money out of your data.

Data Lake Consolidation – the Aggregator Analogy

Perficient Data & Analytics

In my last post I introduced the concept of the Data Lake as a Consolidator and the critical success factor of applying robust Information Governance to this environment. So, a Data Lake as Consolidator. In other words, de-coupling sources from targets so that the focus is on the actual information is a key characteristic of a powerful, and useful, Data Lake. Data & Analytics Healthcare data governance data lake data lakes

Introducing Precisely for Data Integrity

David Menninger's Analyst Perspectives

Data is becoming more valuable and more important to organizations. At the same time, organizations have become more disciplined about the data on which they rely to ensure it is robust, accurate and governed properly.

Data Lake Participants ? Roles and Responsibilities

Perficient Data & Analytics

As you may recall, last time I introduced the analogy of the Aggregator to describe utilizing a Data Lake as a Consolidator of information, and I mentioned the three key roles in this model: the Supplier, the Aggregator and the Consumer. In this post I will provide a little more detail on the responsibilities possessed by each of these roles that, when carried out diligently, provide an effective environment for obtaining significant value from the Lake.

Snowflake Builds on Its Success

David Menninger's Analyst Perspectives

Traditional on-premises data processing solutions have led to a hugely complex and expensive set of data silos where IT spends more time managing the infrastructure than extracting value from the data.

IT 217

Data Lake and Information Governance – The Key Takeaways

Perficient Data & Analytics

A Data Lake can be a highly valuable asset to any enterprise, and there is a myriad of technology solutions available for leveraging the processes to feed, maintain and retrieve information from the Lake. This is the primary Takeaway to keep in mind when a Data Lake solution is being considered – or is already in place but needing improvement – by any organization. So, this completes my journey into Data Lakes and the Information Governance needed.

Working with the Data Lake Aggregator – Standards and Templates

Perficient Data & Analytics

In my previous blog , I described the concept of an “Information Catalog” and how it plays a vital role in ensuring communication between the Data Lake Aggregator and Suppliers and Consumers is efficient and effective due to the common language that it provides. Data & Analytics Healthcare Integration & IT Modernization common data model Consumer data lake data lake aggregator data lakes Data Model information catalog Organizational Data Model supplier

Data Lakes and the Information Governance Critical Success Factor

Perficient Data & Analytics

Since my last post I’ve been working for a client that is actively engaged in establishing a Data Lake for the purpose of supporting their analytics efforts, but also looking to “re-architect” the way their systems collaborate by using this Data Lake environment to control and consolidate all information-sharing interactions within their environment. Data & Analytics Healthcare Big Data Governance data governance data lake data lakes

Providing transactional data to your Hadoop and Kafka data lake

IBM Big Data Hub

The data lake may be all about Apache Hadoop, but integrating operational data can be a challenge. Learn how to deliver real-time feeds of transactional data from mainframes and distributed environments directly into Hadoop clusters and make constantly changing data more available

Data Lake as Aggregator – The Critical Role of the Catalog

Perficient Data & Analytics

My previous blog talked about a Data Lake using a Supplier-Aggregator-Consumer analogy and talking about the roles each of these parties play. Given this is information we are talking about, it is not anything you probably haven’t seen before – essentially it consists of a representation of the information housed in the Data Lake utilizing Information/Data Models and a Glossary of Terms.

Navigating Data Entities, BYOD, and Data Lakes in Microsoft Dynamics

Jet Global

Consultants and developers familiar with the AX data model could query the database using any number of different tools, including a myriad of different report writers. Data Entities. Currently, over 1,700 data entities are available and counting. The Data Warehouse Approach.

Power BI + Azure Data Lake = Velocity & Scale to your Analytics

Perficient Data & Analytics

Context – Bring data together from various web, cloud and on-premise data sources and rapidly drive insights. The biggest challenge Business Analysts and BI developers have is the need to ingest and process medium to large data sets on a regular basis. They spend the most time gathering the data rather than analyzing the data. Power BI Dataflow, the Azure Data Lake Storage Gen 2 makes this a very intuitive, and result based exercise.

Complexity Drives Costs: A Look Inside BYOD and Azure Data Lakes

Jet Global

The Data Security Problem: How We Got Here. You can extract data from relational databases, including Microsoft’s SQL Server using the SQL query language. In addition to reading data, however, you can also use the SQL language to insert, update, or delete records from a database.

Power BI + Azure Data Lake = Velocity & Scale to Your Analytics

Perficient Data & Analytics

Context – Bring data together from various web, cloud and on-premise data sources and rapidly drive insights. The biggest challenge Business Analysts and BI developers have is the need to ingest and process medium to large data sets on a regular basis. They spend the most time gathering the data rather than analyzing the data. Power BI Dataflow, the Azure Data Lake Storage Gen 2 makes this a very intuitive, and result based exercise.

Alternative approaches to implementing your data lake


ScienceSoft answers burning questions about big data lake design and implementation. We look at different approaches to its architecture and contemplate if there exists a preferred technology among the available stack

Data Lakes, Not Just For Analytics Anymore

Perficient Data & Analytics

Data Lakes have been around since the early part of this decade as most Fortune 500 companies have a Data Lake or are building a Data Lake. The drive to lake data has predominately been driven by analytical use cases where Data Scientists can wrangle and prepare data for their study or model building. In my next blog post, I will investigate these challenges that companies are facing as Big Data becomes operational.

Emerging Data Platforms Tackle Big Challenges

David Menninger's Analyst Perspectives

Organizations are always looking to improve their ability to use data and AI to gain meaningful and actionable insights into their operations, services and customer needs.

Data Lake Vs. Big Data Warehouse: Why You Don’t Have To Choose


Learn about the difference between a data lake and a big data warehouse, and define how to structure your big data solution in accordance with your business needs

Reality and misconceptions about big data analytics, data lakes and the future of AI

IBM Big Data Hub

With the amount of choices surrounding big data analytics, data lakes and AI, it can sometimes be difficult to tell fact from fiction. With more than 40% of organizations expecting AI to be a “game changer,” it’s important to have a complete picture of the capabilities and opportunities available

Data Lakes: What Are They and Who Needs Them?

Jet Global

The sheer scale of data being captured by the modern enterprise has necessitated a monumental shift in how that data is stored. From the humble database through to data warehouses , data stores have grown both in scale and complexity to keep pace with the businesses they serve, and the data analysis now required to remain competitive. What’s in a Data Lake? Data warehouses do a great job of standardizing data from disparate sources for analysis.

Data Management Requirements for the Enterprise Data Lake

In(tegrate) the Clouds

SnapLogic published Eight Data Management Requirements for the Enterprise Data Lake. They are: Storage and Data Formats. The company also recently hosted a webinar on Democratizing the Data Lake with Constellation Research and published 2 whitepapers from Mark Madsen. big data integration data lake hadoop snaplogicIngest and Delivery. Discovery and Preparation. Transformation and Analytics. Streaming. Scheduling and Workflow.

Get out of the data swamp with a governed data lake

IBM Big Data Hub

Making your data lake a “governed data lake” is the game changer. Without governance, organizations risk securing the data and as well as protecting it. When data is cataloged and governed, an organization can effectively discover, classify, track history and lineage, quality of data and thereby use it with trust and confidence. A governed data lake contains data that’s accessible, clean, trusted and protected.

Hadoop and data lakes require further examination


A fter the hype comes disillusionment and the growing realization that Hadoop and data lakes do not provide the answer for all analytic tasks. There is still a need to further explore this area and make the best practices and benefits of Hadoop and data lakes more transparent and tangible based on real experiences. Hadoop and Data Lakes Report. Request the free report now × Hadoop and Data Lakes.

3 principles for climbing the AI ladder with IBM Governed Data Lake

IBM Big Data Hub

Recently, we capped off the first leg of the “Enabling digital business with an IBM governed data lake” road shows in the Asia Pacific region with our customers and partners

The business value of a governed data lake

IBM Big Data Hub

Imagine a searchable data management system that would enable you to review crowdsourced, categorized and classified data. Consider that this system would apply to all types of data — structured and unstructured — and become more robust as more users analyze it

Test principles – Data Warehouse vs Data Lake vs Data Vault

Perficient Data & Analytics

Understand Data Warehouse, Data Lake and Data Vault and their specific test principles. This blog tries to throw light on the terminologies data warehouse, data lake and data vault. It will give insight on their advantages, differences and upon the testing principles involved in each of these data modeling methodologies. Let us begin with data warehouse. What is Data Warehouse? Let’s now move on to data lake.

Feed your data lake with change data capture for real-time integration and analytics

IBM Big Data Hub

His business units had a presence in 180 countries worldwide with geographically-dispersed data warehouses and business intelligence applications in various locations Haruto Sakamoto, the Chief Information Officer at a Japanese multinational imaging company, had a few challenges to contend with.

Five Modern Data Architecture Trends

David Menninger's Analyst Perspectives

I was recently asked to identify key modern data architecture trends. Data architectures have changed significantly to accommodate larger volumes of data as well as new types of data such as streaming and unstructured data. Here are some of the trends I see continuing to impact data architectures.

Cloudera announces support for Azure’s next-generation Data Lake Store


The Cloudera platform delivers a one-stop shop that allows you to store any kind of data, process and analyze it in many different ways in a single environment, and integrate with the rest of your data infrastructure. Before they can fully realize the benefits of the cloud, they have had to adjust to new data models and new processes. Eventual consistency and other pitfalls can be a nightmare for engineers trying to migrate complex big data infrastructure to the cloud.

Data Management on Display at Informatica World 2019

David Menninger's Analyst Perspectives

Under that focus, Informatica's conference emphasized capabilities across six areas (all strong areas for Informatica): data integration, data management, data quality & governance, Master Data Management (MDM), data cataloging, and data security. Big Data Data Quality Master Data Management Data Governance Data Management Informatica data lakes Informatica World Data Storage