Schema Evolution in Data Lakes


Whereas a data warehouse will need rigid data modeling and definitions, a data lake can store different types and shapes of data. In a data lake, the schema of the data can be inferred when it’s read, providing the aforementioned flexibility.

The Data Lake is Dead; Long Live the Data Lake!


Martin Wilcox examines the failure of data lakes

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Deriving Value from Data Lakes with AI


Artificial Intelligence and machine learning are the future of every industry, especially data and analytics. Data is growing at a phenomenal rate and that’s not going to stop anytime soon. Once your data is prepared for analysis, the next question is: how else can AI help you?

Here’s Why Automation For Data Lakes Could Be Important

Smart Data Collective

Data Lakes are among the most complex and sophisticated data storage and processing facilities we have available to us today as human beings. Analytics Magazine notes that data lakes are among the most useful tools that an enterprise may have at its disposal when aiming to compete with competitors via innovation. There were a lot of promises made about Big Data that fell at the feet of data scientists to make happen. Big Data is, well…big.

Data Lakes vs. Data Warehouses


Understand the differences between the two most popular options for storing big data

Data Lakes on Cloud & it’s Usage in Healthcare


Data lakes are centralized repositories that can store all structured and unstructured data at any desired scale. The power of the data lake lies in the fact that it often is a cost-effective way to store data. Deploying Data Lakes in the cloud.

Data Lake Consolidation – the Aggregator Analogy

Perficient Data & Analytics

In my last post I introduced the concept of the Data Lake as a Consolidator and the critical success factor of applying robust Information Governance to this environment. So, a Data Lake as Consolidator. Data & Analytics Healthcare data governance data lake data lake

Working with the Data Lake Aggregator – Standards and Templates

Perficient Data & Analytics

In my previous blog , I described the concept of an “Information Catalog” and how it plays a vital role in ensuring communication between the Data Lake Aggregator and Suppliers and Consumers is efficient and effective due to the common language that it provides.

Hadoop and data lakes require further examination


A fter the hype comes disillusionment and the growing realization that Hadoop and data lakes do not provide the answer for all analytic tasks. Hadoop and Data Lakes Report. Request the free report now × Hadoop and Data Lakes.

Power BI + Azure Data Lake = Velocity & Scale to your Analytics

Perficient Data & Analytics

Context – Bring data together from various web, cloud and on-premise data sources and rapidly drive insights. The biggest challenge Business Analysts and BI developers have is the need to ingest and process medium to large data sets on a regular basis. Common Data Model.

Data Lake Participants – Roles and Responsibilities

Perficient Data & Analytics

As you may recall, last time I introduced the analogy of the Aggregator to describe utilizing a Data Lake as a Consolidator of information, and I mentioned the three key roles in this model: the Supplier, the Aggregator and the Consumer. In this post I will provide a little more detail on the responsibilities possessed by each of these roles that, when carried out diligently, provide an effective environment for obtaining significant value from the Lake.

Data Lake as Aggregator – The Critical Role of the Catalog

Perficient Data & Analytics

My previous blog talked about a Data Lake using a Supplier-Aggregator-Consumer analogy and talking about the roles each of these parties play.

Power BI + Azure Data Lake = Velocity & Scale to Your Analytics

Perficient Data & Analytics

Context – Bring data together from various web, cloud and on-premise data sources and rapidly drive insights. The biggest challenge Business Analysts and BI developers have is the need to ingest and process medium to large data sets on a regular basis. Common Data Model.

Unlocking the Potential of Machine Learning in a Data Lake

Data Virtualization

With data becoming the brain food to the intelligence of every organization, regardless of size or sector, it has become crucial to harness this data to achieve the best results, make the most informed decisions and improve productivity. Technology artificial intelligence big data Data integration Data Lake data virtualization Logical Data Lake Machine learning

Data Lake and Information Governance – The Key Takeaways

Perficient Data & Analytics

A Data Lake can be a highly valuable asset to any enterprise, and there is a myriad of technology solutions available for leveraging the processes to feed, maintain and retrieve information from the Lake. This is the primary Takeaway to keep in mind when a Data Lake solution is being considered – or is already in place but needing improvement – by any organization. So, this completes my journey into Data Lakes and the Information Governance needed.

What's the difference between data lakes and data warehouses?

IBM Big Data Hub

If you’ve heard the debate among IT professionals about data lakes versus data warehouses, you might be wondering which is better for your organization. You might even be wondering how these two approaches are different at all

Five Modern Data Architecture Trends

David Menninger's Analyst Perspectives

I was recently asked to identify key modern data architecture trends. Data architectures have changed significantly to accommodate larger volumes of data as well as new types of data such as streaming and unstructured data.

Data Lakes and the Information Governance Critical Success Factor

Perficient Data & Analytics

Since my last post I’ve been working for a client that is actively engaged in establishing a Data Lake for the purpose of supporting their analytics efforts, but also looking to “re-architect” the way their systems collaborate by using this Data Lake environment to control and consolidate all information-sharing interactions within their environment. Data & Analytics Healthcare Big Data Governance data governance data lake data lakes

Data Lake Vs. Big Data Warehouse: Why You Don’t Have To Choose


Learn about the difference between a data lake and a big data warehouse, and define how to structure your big data solution in accordance with your business needs

Data Lakes, Not Just For Analytics Anymore

Perficient Data & Analytics

Data Lakes have been around since the early part of this decade as most Fortune 500 companies have a Data Lake or are building a Data Lake. The drive to lake data has predominately been driven by analytical use cases where Data Scientists can wrangle and prepare data for their study or model building. In my next blog post, I will investigate these challenges that companies are facing as Big Data becomes operational.

Providing transactional data to your Hadoop and Kafka data lake

IBM Big Data Hub

The data lake may be all about Apache Hadoop, but integrating operational data can be a challenge.

Reality and misconceptions about big data analytics, data lakes and the future of AI

IBM Big Data Hub

With the amount of choices surrounding big data analytics, data lakes and AI, it can sometimes be difficult to tell fact from fiction.

Test principles – Data Warehouse vs Data Lake vs Data Vault

Perficient Data & Analytics

Understand Data Warehouse, Data Lake and Data Vault and their specific test principles. This blog tries to throw light on the terminologies data warehouse, data lake and data vault. Let us begin with data warehouse. What is Data Lake?

Data Management on Display at Informatica World 2019

David Menninger's Analyst Perspectives

Under that focus, Informatica's conference emphasized capabilities across six areas (all strong areas for Informatica): data integration, data management, data quality & governance, Master Data Management (MDM), data cataloging, and data security.

Data in 2020: Ventana Research Agenda

David Menninger's Analyst Perspectives

Ventana Research recently announced its 2020 research agenda for data, continuing the guidance we’ve offered for nearly two decades to help organizations derive optimal value and improve business outcomes. Data volumes continue to grow while data latency requirements continue to shrink.

Alternative approaches to implementing your data lake


ScienceSoft answers burning questions about big data lake design and implementation. We look at different approaches to its architecture and contemplate if there exists a preferred technology among the available stack

Data Management Requirements for the Enterprise Data Lake

In(tegrate) the Clouds

SnapLogic published Eight Data Management Requirements for the Enterprise Data Lake. They are: Storage and Data Formats. The company also recently hosted a webinar on Democratizing the Data Lake with Constellation Research and published 2 whitepapers from Mark Madsen. big data integration data lake hadoop snaplogicIngest and Delivery. Discovery and Preparation. Transformation and Analytics. Streaming. Scheduling and Workflow.

Get out of the data swamp with a governed data lake

IBM Big Data Hub

Making your data lake a “governed data lake” is the game changer. Without governance, organizations risk securing the data and as well as protecting it. A governed data lake contains data that’s accessible, clean, trusted and protected.

Feed your data lake with change data capture for real-time integration and analytics

IBM Big Data Hub

His business units had a presence in 180 countries worldwide with geographically-dispersed data warehouses and business intelligence applications in various locations

3 principles for climbing the AI ladder with IBM Governed Data Lake

IBM Big Data Hub

Recently, we capped off the first leg of the “Enabling digital business with an IBM governed data lake” road shows in the Asia Pacific region with our customers and partners

Cloudera announces support for Azure’s next-generation Data Lake Store


The Cloudera platform delivers a one-stop shop that allows you to store any kind of data, process and analyze it in many different ways in a single environment, and integrate with the rest of your data infrastructure. Before they can fully realize the benefits of the cloud, they have had to adjust to new data models and new processes. Eventual consistency and other pitfalls can be a nightmare for engineers trying to migrate complex big data infrastructure to the cloud.

The business value of a governed data lake

IBM Big Data Hub

Imagine a searchable data management system that would enable you to review crowdsourced, categorized and classified data. Consider that this system would apply to all types of data — structured and unstructured — and become more robust as more users analyze it

Big Data for Business: A Requirement for Today’s Business Analytics

David Menninger's Analyst Perspectives

Organizations now must store, process and use data of significantly greater volume and variety than in the past. Analytics Business Intelligence Data Governance Data Preparation Information Management Internet of Things Data Digital Technology blockchain data lakes AI and Machine Learning

The Internet of Things: Real-Time Data and Analytics Enable Business Innovation

David Menninger's Analyst Perspectives

This innovation means that virtually any appropriately designed device can generate and transmit data about its operations, which can facilitate monitoring and a range of automatic functions.

AmFam's Data Journey From Legacy To Cloud: Teaching People To Fish In The Data Lake

Bruno Aziza

AmFam’s journey from a data-rich company to a data-driven company

Why is a data catalog essential to making your data lakes successful?

IBM Big Data Hub

However, all industries depend on data to be successful, and this impacts the way enterprises plan and execute their operations All industries—from healthcare to retail to banking—are digitally transforming themselves every day to become more agile and stay competitive.

IDG Contributor Network: How to overcome the bottlenecks between data lakes and analytics for customer engagement

CIO Business Intelligence

Many organizations in a variety of industries struggle to access the customer data they need to provide personalized and contextual experiences across all touchpoints. Recently, data lakes have been touted as the best way to manage the variety of collected customer data, with many big data and analytics solutions focused on a self-service approach to leveraging the value of the data lake.

News and Announcements from Tableau and TC18

David Menninger's Analyst Perspectives

Once again I attended Tableau's Users Conference, along with 17,000 other attendees, affectionately self-referred to as "data nerds". Big Data Data Governance Data Integration Data Preparation Tableau Software data lakes

New Data Architectures are too Data-Store-Centric

Data Virtualization

Too often the design of new data architectures is based on old principles: they are still very data-store-centric. They consist of many physical data stores in which data is stored repeatedly and redundantly.

Data Governance Makes Data Security Less Scary


Do you know where your data is? What data you have? Add to the mix the potential for a data breach followed by non-compliance, reputational damage and financial penalties and a real horror story could unfold. Don’t Scream; You Can Protect Your Sensitive Data.

Constructing A Digital Transformation Strategy: Putting the Data in Digital Transformation


Once you’ve determined what part(s) of your business you’ll be innovating — the next step in a digital transformation strategy is using data to get there. Constructing A Digital Transformation Strategy: Data Enablement. With automation, data quality is systemically assured.

More Structured or Less, Data Virtualization Delivers

Data Virtualization

Alice Well’s early successes as CIO at Advanced Banking Corporation (ABC) in solving the old problem of getting real-time data (Gaining Real Time Insight) to the call center and the newer opportunity presented by the data lake (A Warehouse in.

The Path to Artificial Intelligence in Healthcare

Perficient Data & Analytics

In an industry that has massive amounts of data and is very dependent on the data to both run efficiently and, more importantly, delivery high quality patient care any technology which can lead to significant improvement is anticipated. First of all, while you can input data from many different sources, the best approach, is to layer Artificial Intelligence applications on top of an established data infrastructure.