Diving Deeper into the Data Lake

David Menninger's Analyst Perspectives

A data lake is a centralized repository designed to house big data in structured, semi-structured and unstructured form. Our data lake research has uncovered some points to consider in your efforts, and I’d like to offer a deeper dive into our findings.

Data Lakes Meet Data Warehouses

David Menninger's Analyst Perspectives

In this analyst perspective, Dave Menninger takes a look at data lakes. He explains the term “data lake,” describes common use cases and shares his views on some of the latest market trends.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

What are the differences between Data Lake and Data Warehouse?

Analytics Vidhya

Overview Understand the meaning of data lake and data warehouse We will see what are the key differences between Data Warehouse and Data Lake. The post What are the differences between Data Lake and Data Warehouse?

Differentiating Between Data Lakes and Data Warehouses

Smart Data Collective

The market for data warehouses is booming. While there is a lot of discussion about the merits of data warehouses, not enough discussion centers around data lakes. We talked about enterprise data warehouses in the past, so let’s contrast them with data lakes.

Top Considerations for Building an Open Cloud Data Lake

In this paper, we explore the top considerations for building a cloud data lake including architectural principles, when to use cloud data lake engines and how to empower non-technical users.

Data Virtualization: The Key to a Successful Data Lake

Data Virtualization

If you’ve decided to implement a data lake, you might want to keep Gartner’s assessment in mind, which is that about 80% of all data lake projects will actually fail.

Schema Evolution in Data Lakes

KDnuggets

Whereas a data warehouse will need rigid data modeling and definitions, a data lake can store different types and shapes of data. In a data lake, the schema of the data can be inferred when it’s read, providing the aforementioned flexibility.

Here’s Why Automation For Data Lakes Could Be Important

Smart Data Collective

Data Lakes are among the most complex and sophisticated data storage and processing facilities we have available to us today as human beings. Analytics Magazine notes that data lakes are among the most useful tools that an enterprise may have at its disposal when aiming to compete with competitors via innovation. There were a lot of promises made about Big Data that fell at the feet of data scientists to make happen. Big Data is, well…big.

Data Lakes vs. Data Warehouses

DataCamp

Understand the differences between the two most popular options for storing big data

7 Key Benefits of Proper Data Lake Ingestion

Smart Data Collective

It’s impossible to deny the importance of data in several industries, but that data can get overwhelming if it isn’t properly managed. The reality is businesses that are collecting data will likely be doing so on several levels. Covering Data Types.

Deriving Value from Data Lakes with AI

Sisense

Artificial Intelligence and machine learning are the future of every industry, especially data and analytics. Data is growing at a phenomenal rate and that’s not going to stop anytime soon. Once your data is prepared for analysis, the next question is: how else can AI help you?

Ultimate Guide to the Cloud Data Lake Engine

This guide describes how to evaluate cloud data lake engine offerings based on their ability to deliver on their promise of improving performance, data accessibility, and operational efficiency as compared with earlier methods of querying the data lake.

Unlocking the Potential of Machine Learning in a Data Lake

Data Virtualization

With data becoming the brain food to the intelligence of every organization, regardless of size or sector, it has become crucial to harness this data to achieve the best results, make the most informed decisions and improve productivity. Technology artificial intelligence big data Data integration Data Lake data virtualization Logical Data Lake Machine learning

What's the difference between data lakes and data warehouses?

IBM Big Data Hub

If you’ve heard the debate among IT professionals about data lakes versus data warehouses, you might be wondering which is better for your organization. You might even be wondering how these two approaches are different at all

Data Lakes on Cloud & it’s Usage in Healthcare

BizAcuity

Data lakes are centralized repositories that can store all structured and unstructured data at any desired scale. Data can be stored as-is, without first structuring it, and different types of analytics can be run on it, from dashboards and visualizations to big data processing, real-time analytics, and machine learning to improve decision making. The power of the data lake lies in the fact that it often is a cost-effective way to store data.

Data Lake Consolidation – the Aggregator Analogy

Perficient Data & Analytics

In my last post I introduced the concept of the Data Lake as a Consolidator and the critical success factor of applying robust Information Governance to this environment. So, a Data Lake as Consolidator. In other words, de-coupling sources from targets so that the focus is on the actual information is a key characteristic of a powerful, and useful, Data Lake. Data & Analytics Healthcare data governance data lake data lakes

Navigating Data Entities, BYOD, and Data Lakes in Microsoft Dynamics

Jet Global

Consultants and developers familiar with the AX data model could query the database using any number of different tools, including a myriad of different report writers. Data Entities. Currently, over 1,700 data entities are available and counting. The Data Warehouse Approach.

Data Analytics in the Cloud for Developers and Founders

Speaker: Javier Ramírez, Senior AWS Developer Advocate, AWS

You have lots of data, and you are probably thinking of using the cloud to analyze it. But how will you move data into the cloud? In which format? How will you validate and prepare the data? What about streaming data? Can data scientists discover and use the data? Can business people create reports via drag and drop? Can operations monitor what’s going on? Will the data lake scale when you have twice as much data? Is your data secure? In this session, we address common pitfalls of building data lakes and show how AWS can help you manage data and analytics more efficiently.

Data Lake Participants ? Roles and Responsibilities

Perficient Data & Analytics

As you may recall, last time I introduced the analogy of the Aggregator to describe utilizing a Data Lake as a Consolidator of information, and I mentioned the three key roles in this model: the Supplier, the Aggregator and the Consumer. In this post I will provide a little more detail on the responsibilities possessed by each of these roles that, when carried out diligently, provide an effective environment for obtaining significant value from the Lake.

Complexity Drives Costs: A Look Inside BYOD and Azure Data Lakes

Jet Global

The Data Security Problem: How We Got Here. You can extract data from relational databases, including Microsoft’s SQL Server using the SQL query language. In addition to reading data, however, you can also use the SQL language to insert, update, or delete records from a database.

Cloudera Consolidates Its Data Platform

David Menninger's Analyst Perspectives

Organizations are dealing with exponentially increasing data that ranges broadly from customer-generated information, financial transactions, edge-generated data and even operational IT server logs.

IT 171

Driving Business Value and ROI from a Hybrid Cloud Data Lake

Alation

For many enterprises, a hybrid cloud data lake is no longer a trend, but becoming reality. With an on-premise deployment, enterprises have full control over data security, data access, and data governance. Data that needs to be tightly controlled (e.g.

The Next-Generation Cloud Data Lake: An Open, No-Copy Data Architecture

A next-gen cloud data lake architecture has emerged that brings together the best attributes of the data warehouse and the data lake. This new open data architecture is built to maximize data access with minimal data movement and no data copies.

Data Lake and Information Governance – The Key Takeaways

Perficient Data & Analytics

A Data Lake can be a highly valuable asset to any enterprise, and there is a myriad of technology solutions available for leveraging the processes to feed, maintain and retrieve information from the Lake. This is the primary Takeaway to keep in mind when a Data Lake solution is being considered – or is already in place but needing improvement – by any organization. So, this completes my journey into Data Lakes and the Information Governance needed.

Data Lakes: What Are They and Who Needs Them?

Jet Global

The sheer scale of data being captured by the modern enterprise has necessitated a monumental shift in how that data is stored. From the humble database through to data warehouses , data stores have grown both in scale and complexity to keep pace with the businesses they serve, and the data analysis now required to remain competitive. What’s in a Data Lake? Data warehouses do a great job of standardizing data from disparate sources for analysis.

Data Lake Vs. Big Data Warehouse: Why You Don’t Have To Choose

ScienceSoft

Learn about the difference between a data lake and a big data warehouse, and define how to structure your big data solution in accordance with your business needs

Reality and misconceptions about big data analytics, data lakes and the future of AI

IBM Big Data Hub

With the amount of choices surrounding big data analytics, data lakes and AI, it can sometimes be difficult to tell fact from fiction.

Working with the Data Lake Aggregator – Standards and Templates

Perficient Data & Analytics

In my previous blog , I described the concept of an “Information Catalog” and how it plays a vital role in ensuring communication between the Data Lake Aggregator and Suppliers and Consumers is efficient and effective due to the common language that it provides. Data & Analytics Healthcare Integration & IT Modernization common data model Consumer data lake data lake aggregator data lakes Data Model information catalog Organizational Data Model supplier

Data Lakes and the Information Governance Critical Success Factor

Perficient Data & Analytics

Since my last post I’ve been working for a client that is actively engaged in establishing a Data Lake for the purpose of supporting their analytics efforts, but also looking to “re-architect” the way their systems collaborate by using this Data Lake environment to control and consolidate all information-sharing interactions within their environment. Data & Analytics Healthcare Big Data Governance data governance data lake data lakes

Microsoft Azure: Cloud Computing for Data and Analytics

David Menninger's Analyst Perspectives

Organizations are increasingly using data as a strategic asset, which makes data services critical. Huge volumes of data need to be stored, managed, discovered and analyzed.

Providing transactional data to your Hadoop and Kafka data lake

IBM Big Data Hub

The data lake may be all about Apache Hadoop, but integrating operational data can be a challenge. Learn how to deliver real-time feeds of transactional data from mainframes and distributed environments directly into Hadoop clusters and make constantly changing data more available

Data Lake as Aggregator – The Critical Role of the Catalog

Perficient Data & Analytics

My previous blog talked about a Data Lake using a Supplier-Aggregator-Consumer analogy and talking about the roles each of these parties play. Given this is information we are talking about, it is not anything you probably haven’t seen before – essentially it consists of a representation of the information housed in the Data Lake utilizing Information/Data Models and a Glossary of Terms.

Data Cataloging in the Data Lake: Alation + Kylo

Alation

We are living in a new era of data defined by two massively disruptive trends – one architectural and the other organizational. Architecturally the introduction of Hadoop, a file system designed to store massive amounts of data, radically affected the cost model of data.

Power BI + Azure Data Lake = Velocity & Scale to your Analytics

Perficient Data & Analytics

Context – Bring data together from various web, cloud and on-premise data sources and rapidly drive insights. The biggest challenge Business Analysts and BI developers have is the need to ingest and process medium to large data sets on a regular basis. They spend the most time gathering the data rather than analyzing the data. Power BI Dataflow, the Azure Data Lake Storage Gen 2 makes this a very intuitive, and result based exercise.

Alternative approaches to implementing your data lake

ScienceSoft

ScienceSoft answers burning questions about big data lake design and implementation. We look at different approaches to its architecture and contemplate if there exists a preferred technology among the available stack

Snowflake Builds on Its Success

David Menninger's Analyst Perspectives

Traditional on-premises data processing solutions have led to a hugely complex and expensive set of data silos where IT spends more time managing the infrastructure than extracting value from the data.

IT 234

Data Lakes, Not Just For Analytics Anymore

Perficient Data & Analytics

Data Lakes have been around since the early part of this decade as most Fortune 500 companies have a Data Lake or are building a Data Lake. The drive to lake data has predominately been driven by analytical use cases where Data Scientists can wrangle and prepare data for their study or model building. In my next blog post, I will investigate these challenges that companies are facing as Big Data becomes operational.

Data in 2021: Ventana Research Market Agenda

David Menninger's Analyst Perspectives

Ventana Research recently announced its 2021 Market Agenda for data, continuing the guidance we have offered for nearly two decades to help organizations derive optimal value and improve business outcomes.

Power BI + Azure Data Lake = Velocity & Scale to Your Analytics

Perficient Data & Analytics

Context – Bring data together from various web, cloud and on-premise data sources and rapidly drive insights. The biggest challenge Business Analysts and BI developers have is the need to ingest and process medium to large data sets on a regular basis. They spend the most time gathering the data rather than analyzing the data. Power BI Dataflow, the Azure Data Lake Storage Gen 2 makes this a very intuitive, and result based exercise.

Informatica Continues to Evolve Data Management Platform

David Menninger's Analyst Perspectives

Organizations are accelerating their digital transformation and looking for innovative ways to engage with customers in this new digital era of data management. The challenge is to ensure that processes, applications and data can still be integrated across cloud and on-premises systems.

Introducing Precisely for Data Integrity

David Menninger's Analyst Perspectives

Data is becoming more valuable and more important to organizations. At the same time, organizations have become more disciplined about the data on which they rely to ensure it is robust, accurate and governed properly.

Feed your data lake with change data capture for real-time integration and analytics

IBM Big Data Hub

His business units had a presence in 180 countries worldwide with geographically-dispersed data warehouses and business intelligence applications in various locations

Data Management Requirements for the Enterprise Data Lake

In(tegrate) the Clouds

SnapLogic published Eight Data Management Requirements for the Enterprise Data Lake. They are: Storage and Data Formats. The company also recently hosted a webinar on Democratizing the Data Lake with Constellation Research and published 2 whitepapers from Mark Madsen. big data integration data lake hadoop snaplogicIngest and Delivery. Discovery and Preparation. Transformation and Analytics. Streaming. Scheduling and Workflow.