Data Lake and Document - Data Leaders Brief

Data Lake

Document

Multicloud data lake analytics with Amazon Athena

AWS Big Data

MARCH 18, 2024

Many organizations operate data lakes spanning multiple cloud data stores. In these cases, you may want an integrated query layer to seamlessly run analytical queries across these diverse cloud stores and streamline your data analytics processes. Refer to the respective documentation for details.

Data Lake

Data Lake Analytics Cost-Benefit Management

Build a real-time GDPR-aligned Apache Iceberg data lake

AWS Big Data

FEBRUARY 24, 2023

Data lakes are a popular choice for today’s organizations to store their data around their business activities. As a best practice of a data lake design, data should be immutable once stored. A data lake built on AWS uses Amazon Simple Storage Service (Amazon S3) as its primary storage environment.

Data Lake

Data Lake Metadata Testing Data Warehouse

Join 52,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

The Key to Sustainable Energy Optimization: A Data-Driven Approach for Manufacturing

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

How To Get Promoted In Product Management

MORE WEBINARS

Trending Sources

Create an Apache Hudi-based near-real-time transactional data lake using AWS DMS, Amazon Kinesis, AWS Glue streaming ETL, and data visualization using Amazon QuickSight

AWS Big Data

AUGUST 3, 2023

Data analytics on operational data at near-real time is becoming a common need. Due to the exponential growth of data volume, it has become common practice to replace read replicas with data lakes to have better scalability and performance. For more information, see Changing the default settings for your data lake.

Data Lake

Data Lake Visualization Dashboards Insurance

Webinars

The Key to Sustainable Energy Optimization: A Data-Driven Approach for Manufacturing

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

How To Get Promoted In Product Management

MORE WEBINARS

Data Modeling 301 for the cloud: data lake and NoSQL data modeling and design

erwin

AUGUST 15, 2022

For NoSQL, data lakes, and data lake houses—data modeling of both structured and unstructured data is somewhat novel and thorny. This blog is an introduction to some advanced NoSQL and data lake database design techniques (while avoiding common pitfalls) is noteworthy. Data Modeling.

Data Lake

Data Lake Modeling Unstructured Data Data Warehouse

Use Amazon Athena with Spark SQL for your open-source transactional table formats

AWS Big Data

JANUARY 24, 2024

AWS-powered data lakes, supported by the unmatched availability of Amazon Simple Storage Service (Amazon S3), can handle the scale, agility, and flexibility required to combine different data and analytics approaches.

Snapshot

Snapshot Data Lake Metadata Optimization

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

AWS Big Data

MARCH 7, 2024

At the same time, they need to optimize operational costs to unlock the value of this data for timely insights and do so with a consistent performance. With this massive data growth, data proliferation across your data stores, data warehouse, and data lakes can become equally challenging.

Data Lake

Data Lake Analytics Dashboards Metrics

LA Public Defender CIO digitizes to divert people to programs, not prison

CIO Business Intelligence

APRIL 4, 2024

In total, it took the CIO’s team and agency a little over two years to convert 160 million documents into a transformed, revamped, and people-centric system, built on the Salesforce CRM, that tells their stories and focuses on people outcomes, not case outcomes.

Digital Transformation

Digital Transformation Data Lake ROI Modeling

Exploring real-time streaming for generative AI Applications

AWS Big Data

MARCH 25, 2024

A RAG-based generative AI application can only produce generic responses based on its training data and the relevant documents in the knowledge base. For example, Amazon DynamoDB provides a feature for streaming CDC data to Amazon DynamoDB Streams or Kinesis Data Streams.

Data Lake

Data Lake Unstructured Data Management Modeling

Gartner Market Guide to DataOps Software

DataKitchen

DECEMBER 6, 2022

The document they wrote is exceptionally close to what we see in the market and what our products do ! This document is essential because buyers look to Gartner for advice on what to do and how to buy IT software. The two things we are most excited about are: First, DataOps is distinct from all Data Analytic tools.

Software

Software Marketing Data Lake Testing

Configure cross-Region table access with the AWS Glue Catalog and AWS Lake Formation

AWS Big Data

AUGUST 3, 2023

Today’s modern data lakes span multiple accounts, AWS Regions, and lines of business in organizations. It’s important that their data solution gives them the ability to share and access data securely and safely across Regions. For example, we are using a data lake administrator role called LF-Admin.

Data Lake

Data Lake Metadata Management Data Processing

Data governance in the age of generative AI

AWS Big Data

FEBRUARY 29, 2024

Data governance is a critical building block across all these approaches, and we see two emerging areas of focus. First, many LLM use cases rely on enterprise knowledge that needs to be drawn from unstructured data such as documents, transcripts, and images, in addition to structured data from data warehouses.

Data Governance

Data Governance Unstructured Data Metadata Data Lake

4 ways generative AI addresses manufacturing challenges

IBM Big Data Hub

APRIL 15, 2024

Or we create a data lake, which quickly degenerates to a data swamp. Various initiatives to create a knowledge graph of these systems have been only partially successful due to the depth of legacy knowledge, incomplete documentation and technical debt incurred over decades.

Manufacturing

Manufacturing Contextual Data Knowledge Discovery Data Lake

Avoid generative AI malaise to innovate and build business value

CIO Business Intelligence

APRIL 1, 2024

Capturing the “as-is” state of your environment, you’ll develop topology diagrams and document information on your technical systems. Cleanse your data. GenAI requires high-quality data. Ensure that data is cleansed, consistent, and centrally stored, ideally in a data lake. Assess your readiness.

Data Lake

Data Lake Consulting Uncertainty Risk

Migrate data from Azure Blob Storage to Amazon S3 using AWS Glue

AWS Big Data

OCTOBER 20, 2023

Today, we are pleased to announce new AWS Glue connectors for Azure Blob Storage and Azure Data Lake Storage that allow you to move data bi-directionally between Azure Blob Storage, Azure Data Lake Storage, and Amazon Simple Storage Service (Amazon S3). option("header","true").load("wasbs://yourblob@youraccountname.blob.core.windows.net/loadingtest-input/100mb")

Data Lake

Data Lake Big Data Consulting Data Warehouse

Migrate Hive data from CDH to CDP public cloud

Cloudera

JUNE 25, 2021

The Replication Manager support matrix is documented in our public docs. This blog post outlines detailed step by step instructions to perform Hive Replication from an on-prem CDH cluster to a CDP Public Cloud Data Lake. CDP Data Lake cluster versions – CM 7.4.0, CDP Data Lake cluster versions – CM 7.4.0,

Data Lake

Data Lake Metadata Unstructured Data Management

Hybrid Vs. Multi-Cloud: 5 Key Comparisons in Kafka Architectures

Smart Data Collective

AUGUST 17, 2022

You can safely use an Apache Kafka cluster for seamless data movement from the on-premise hardware solution to the data lake using various cloud services like Amazon’s S3 and others. It will enable you to quickly transform and load the data results into Amazon S3 data lakes or JDBC data stores.

Data Lake

Data Lake Insurance Data-driven Data Processing

Integrating Data Governance and Enterprise Architecture

erwin

SEPTEMBER 3, 2020

Data governance provides time-sensitive, current-state architecture information with a high level of quality. It documents your data assets from end to end for business understanding and clear data lineage with traceability. Automating Data Governance and Enterprise Architecture.

Data Governance

Data Governance Enterprise Risk Data Lake

Implement alerts in Amazon OpenSearch Service with PagerDuty

AWS Big Data

JUNE 8, 2023

You can use the plugin to set up different monitors, including cluster health, an individual document, a custom query, or aggregated data. At AWS, he is focused on Data Lake implementations, and Search, Analytical workloads using Amazon OpenSearch Service. These monitors can be used to send alerts to users.

Data Lake

Data Lake Dashboards Metrics Testing

Dairyland powers up for a generative AI edge

CIO Business Intelligence

APRIL 9, 2024

Beginning in 2021, the Minneapolis-based Microsoft partner helped Dairyland migrate from several custom legacy applications to a commercial implementation of Dynamics 365 and an Azure data lake, which set the stage for the power company’s early foray into AI, according to the systems integrator.

Digital Transformation

Digital Transformation Machine Learning Data Lake Software

Data Mesh 101: How Data Mesh Can Be Used in an Organization

Ontotext

FEBRUARY 12, 2024

This calls for additional planning, documentation, and testing. A data mesh will likely require more engineers to get started, so a critical mass is needed for successful adoption. Processes for data quality checks, data maintenance, and guidelines need to be established. For instance, JPMorgan Chase & Co.

Data Quality

Data Quality Data-driven Data Lake Data Governance

How Etihad taps data science to optimise airline operations

CIO Business Intelligence

MARCH 9, 2022

Despite the worldwide chaos, UAE national airline Etihad has managed to generate productivity gains and cost savings from insights using data science. Etihad began its data science journey with the Cloudera Data Platform and moved its data to the cloud to set up a data lake. Reem Alaya Lebhar.

Data Science

Data Science Data Lake Cost-Benefit Digital Transformation

Case study: Policy Enforcement Automation With Semantics

Ontotext

MAY 2, 2024

Storage-centric approach In the storage-centric approach, people try to address data silos by throwing everything in a data lake or a data warehouse. But, although, this helps somewhat in terms of architecture, soon these data lakes become unwieldy.

Metadata

Metadata Data Lake Data-driven Enterprise

Doing Cloud Migration and Data Governance Right the First Time

erwin

OCTOBER 8, 2020

But even with the “need for speed” to market, new applications must be modeled and documented for compliance, transparency and stakeholder literacy. Model cloud data structures: erwin DM converts, modifies and models the new cloud data structures. Request an erwin Cloud Catalyst assessment. Subscribe to the erwin Expert Blog.

Data Governance

Data Governance Metadata Testing Data Lake

Turning the page

Cloudera

JUNE 1, 2021

Future-proof, “no-code” connectors enable customers to extract data from a wide range of popular data sources, and multi-level transformations are automatically orchestrated using, just, SQL. Cazena powers the delivery of instant cloud data lakes, making it possible to accelerate time to analytics and AI/ML from months to minutes.

Uncertainty

Uncertainty Cost-Benefit Risk Strategy

Cloud Data Science News – Beta 6

Data Science 101

DECEMBER 16, 2019

It now also supports PDF documents. Azure Data Factory Preserves Metadata during File Copy When performing a File copy between Amazon S3, Azure Blob, and Azure Data Lake Gen 2, the metadata will be copied as well. Not a huge update but still a nice feature. Azure Database for MySQL now supports MySQL 8.0

Data Science

Data Science Metadata Machine Learning Data Lake

How to use foundation models and trusted governance to manage AI workflow risk

IBM Big Data Hub

OCTOBER 16, 2023

It includes processes that trace and document the origin of data, models and associated metadata and pipelines for audits. How to scale AL and ML with built-in governance A fit-for-purpose data store built on an open lakehouse architecture allows you to scale AI and ML while providing built-in governance tools.

Risk

Risk Modeling Management Metadata

How the Masters uses watsonx to manage its AI lifecycle

IBM Big Data Hub

APRIL 9, 2024

This allows the Masters to scale analytics and AI wherever their data resides, through open formats and integration with existing databases and tools. “Hole distances and pin positions vary from round to round and year to year; these factors are important as we stage the data.” Lastly, watsonx.data pulls from live feeds.

Management

Management IT Machine Learning Metrics

Advancing AI: The emergence of a modern information lifecycle

CIO Business Intelligence

DECEMBER 4, 2023

The ways modern data is used, processed, and analyzed are continuously evolving as machine learning technology becomes better at these tasks. With constant advances in intelligent document processing, compute power, DevOps workflows, and AI, the content, context, and value of unstructured data is rapidly increasing.

Unstructured Data

Unstructured Data Data Lake Metadata Business Objectives

Announcing the AWS Well-Architected Data Analytics Lens

AWS Big Data

MARCH 26, 2024

This version covers the following topics: New Lens for the Well-Architected Tool in the Lens Catalog New Data Mesh analytics user scenario Included guidance on building ACID compliant data lakes using Iceberg Included guidance on adding business context to your data catalog to improve searchability and access How best to leverage Serverless to build (..)

Data Analytics

Data Analytics Analytics Big Data Data Lake

What is a data architect? Skills, salaries, and how to become a data framework master

CIO Business Intelligence

OCTOBER 13, 2023

Data architect Armando Vázquez identifies eight common types of data architects: Enterprise data architect: These data architects oversee an organization’s overall data architecture, defining data architecture strategy and designing and implementing architectures.

Data Architecture

Data Architecture Data Warehouse Statistics Visualization

DNS Zone Setup Best Practices on Azure

Cloudera

FEBRUARY 12, 2024

Please refer to the Microsoft documentation for detail. Please refer to Microsoft documentation for the details of setting up an Azure DNS Private Resolver. The DNS records of the managed services using service endpoints will be on the internet and managed by Microsoft. Not all managed services support services endpoint.

Data Warehouse

Data Warehouse Machine Learning Data Lake Management

Create a modern data platform using the Data Build Tool (dbt) in the AWS Cloud

AWS Big Data

NOVEMBER 9, 2023

This popular open-source tool for data warehouse transformations won out over other ETL tools for several reasons. The tool also offered desirable out-of-the-box features like data lineage, documentation, and unit testing. This process has been scheduled to run daily, ensuring a consistent batch of fresh data for analysis.

Data Warehouse

Data Warehouse Testing Data Quality Reporting

How gaming companies can use Amazon Redshift Serverless to build scalable analytical applications faster and easier

AWS Big Data

MARCH 7, 2023

A data hub contains data at multiple levels of granularity and is often not integrated. It differs from a data lake by offering data that is pre-validated and standardized, allowing for simpler consumption by users. Data hubs and data lakes can coexist in an organization, complementing each other.

Analytics

Analytics Data Warehouse Data Lake Metadata

CIOs press ahead for gen AI edge — despite misgivings

CIO Business Intelligence

OCTOBER 18, 2023

We’re seeing lots and lots of pilots,” says Gartner AI analyst Arun Chandrasekaran, who notes content creation, document summarization, sentiment analysis, and enterprise search chief among the initial use cases. A recent survey of nearly 1,000 IT decision-makers conducted by Foundry underscores this. “As for internal enterprise exploration.

Risk

Risk Manufacturing Enterprise Technology

Modernize Your ETL Processes, Discover Better Insights

Sisense

JULY 8, 2020

Every company wants every team within their business to make smarter, data-driven decisions. Customer support teams look at trends in support tickets or do text analysis on conversations to understand where they can provide better onboarding and documentation.

Data Warehouse

Data Warehouse Data Lake Data-driven Cost-Benefit

Generative AI: 5 enterprise predictions for AI and security — for 2023, 2024, and beyond

CIO Business Intelligence

OCTOBER 25, 2023

The release of intellectual property and non-public information Generative AI tools can make it easy for well-meaning users to leak sensitive and confidential data. Once shared, this data can be fed into the data lakes used to train large language models (LLMs) and can be discovered by other users.

Enterprise

Enterprise Risk Manufacturing Finance

Choosing an open table format for your transactional data lake on AWS

AWS Big Data

JUNE 9, 2023

A modern data architecture enables companies to ingest virtually any type of data through automated pipelines into a data lake, which provides highly durable and cost-effective object storage at petabyte or exabyte scale.

Data Lake

Data Lake Metadata Optimization Statistics

The New Data Integration Requirements

In(tegrate) the Clouds

JUNE 23, 2016

Large-volume data integration is available to Hadoop-based data lakes or cloud-based data warehouses. Integration has to support the continuum of data velocities starting from batch all the way to continuous streams. Integration is primarily document- centric.

Data Integration

Data Integration Data Lake Data Warehouse Data-driven

The year’s top 10 enterprise AI trends — so far

CIO Business Intelligence

SEPTEMBER 21, 2023

ML was used for sentiment analysis, and to scan documents, classify images, transcribe recordings, and other specific functions. One of the best immediate use cases is summarizing documents and extracting information from material, he says. But it’ll enable people to do higher-value work than they are currently able to do.”

Enterprise

Enterprise Consulting Modeling Cost-Benefit

Forrester Does the Math on the ROI of the Alation Data Catalog

Alation

FEBRUARY 13, 2020

While correlating ROI and data culture for increased collaboration can be difficult, Forrester was able to put numbers to many of the benefits enterprises see with Alation. Their research is documented in an extensive report published today.

ROI

ROI Cost-Benefit Unstructured Data Data Lake

Data Governance Makes Data Security Less Scary

erwin

OCTOBER 31, 2019

What data do we have and where is it? Data is a critical asset used to operate, manage and grow a business. While sometimes at rest in databases, data lakes and data warehouses; a large percentage is federated and integrated across the enterprise, introducing governance, manageability and risk issues that must be managed.

Data Governance

Data Governance Metadata Risk Data Lake

DaVita’s technology strategy driven by the ‘power of purpose’

CIO Business Intelligence

DECEMBER 13, 2022

We’re looking at a variety of sources of data, putting it in data lakes, and then using that to drive predictive models that really help our doctors and our care teams to stratify our patient’s risk by taking actions at the right time.

Strategy

Strategy Technology Digital Transformation Data Lake

Successfully conduct a proof of concept in Amazon Redshift

AWS Big Data

MARCH 27, 2024

Amazon Redshift is a fast, scalable, and fully managed cloud data warehouse that allows you to process and run your complex SQL analytics workloads on structured and semi-structured data. It also helps you securely access your data in operational databases, data lakes, or third-party datasets with minimal movement or copying of data.

Testing

Testing Data Warehouse Metrics Cost-Benefit

My introduction and my love for DATA

Sanjeev Mohan

JANUARY 15, 2018

You see, I write research documents, help my clients through inquiries and present at various events. However, each document that I have written is really a love letter to that topic. Look forward to blogs spanning big data technologies, advanced analytics, data governance, IoT and blockchain.

Data Lake

Data Lake IoT Big Data Interactive

Multicloud data lake analytics with Amazon Athena

Build a real-time GDPR-aligned Apache Iceberg data lake

Webinars

Trending Sources

Create an Apache Hudi-based near-real-time transactional data lake using AWS DMS, Amazon Kinesis, AWS Glue streaming ETL, and data visualization using Amazon QuickSight

Webinars

Data Modeling 301 for the cloud: data lake and NoSQL data modeling and design

Use Amazon Athena with Spark SQL for your open-source transactional table formats

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

LA Public Defender CIO digitizes to divert people to programs, not prison

Exploring real-time streaming for generative AI Applications

Gartner Market Guide to DataOps Software

Configure cross-Region table access with the AWS Glue Catalog and AWS Lake Formation

Data governance in the age of generative AI

4 ways generative AI addresses manufacturing challenges

Avoid generative AI malaise to innovate and build business value

Migrate data from Azure Blob Storage to Amazon S3 using AWS Glue

Migrate Hive data from CDH to CDP public cloud

Hybrid Vs. Multi-Cloud: 5 Key Comparisons in Kafka Architectures

Integrating Data Governance and Enterprise Architecture

Implement alerts in Amazon OpenSearch Service with PagerDuty

Dairyland powers up for a generative AI edge

Data Mesh 101: How Data Mesh Can Be Used in an Organization

How Etihad taps data science to optimise airline operations

Case study: Policy Enforcement Automation With Semantics

Doing Cloud Migration and Data Governance Right the First Time

Turning the page

Cloud Data Science News – Beta 6

How to use foundation models and trusted governance to manage AI workflow risk

How the Masters uses watsonx to manage its AI lifecycle

Advancing AI: The emergence of a modern information lifecycle

Announcing the AWS Well-Architected Data Analytics Lens

What is a data architect? Skills, salaries, and how to become a data framework master

DNS Zone Setup Best Practices on Azure

Create a modern data platform using the Data Build Tool (dbt) in the AWS Cloud

How gaming companies can use Amazon Redshift Serverless to build scalable analytical applications faster and easier

CIOs press ahead for gen AI edge — despite misgivings

Modernize Your ETL Processes, Discover Better Insights

Generative AI: 5 enterprise predictions for AI and security — for 2023, 2024, and beyond

Choosing an open table format for your transactional data lake on AWS

The New Data Integration Requirements

The year’s top 10 enterprise AI trends — so far

Forrester Does the Math on the ROI of the Alation Data Catalog

Data Governance Makes Data Security Less Scary

DaVita’s technology strategy driven by the ‘power of purpose’

Successfully conduct a proof of concept in Amazon Redshift

My introduction and my love for DATA

Stay Connected