Data Lake and Events - Data Leaders Brief

Monitor data pipelines in a serverless data lake

AWS Big Data

AUGUST 9, 2023

The combination of a data lake in a serverless paradigm brings significant cost and performance benefits. By monitoring application logs, you can gain insights into job execution, troubleshoot issues promptly to ensure the overall health and reliability of data pipelines.

Data Lake

Data Lake Metrics Testing Cost-Benefit

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

AWS Big Data

APRIL 3, 2024

licensed, 100% open-source data table format that helps simplify data processing on large datasets stored in data lakes. Data engineers use Apache Iceberg because it’s fast, efficient, and reliable at any scale and keeps records of how datasets change over time.

Data Lake

Data Lake Snapshot Metadata Data Architecture

Enable business users to analyze large datasets in your data lake with Amazon QuickSight

AWS Big Data

JUNE 23, 2023

Imperva Cloud WAF protects hundreds of thousands of websites and blocks billions of security events every day. Events and many other security data types are stored in Imperva’s Threat Research Multi-Region data lake. Imperva harnesses data to improve their business outcomes.

Data Lake

Data Lake Cost-Benefit Dashboards Data Warehouse

Webinars

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

How To Get Promoted In Product Management

MORE WEBINARS

Build a real-time GDPR-aligned Apache Iceberg data lake

AWS Big Data

FEBRUARY 24, 2023

Data lakes are a popular choice for today’s organizations to store their data around their business activities. As a best practice of a data lake design, data should be immutable once stored. A data lake built on AWS uses Amazon Simple Storage Service (Amazon S3) as its primary storage environment.

Data Lake

Data Lake Metadata Testing Data Warehouse

Build an ETL process for Amazon Redshift using Amazon S3 Event Notifications and AWS Step Functions

AWS Big Data

AUGUST 31, 2023

It also helps you to securely access your data in operational databases, data lakes or third-party datasets with minimal movement or copying. Amazon S3 Event Notifications is an Amazon S3 feature that you can enable in order to receive notifications when specific events occur in your S3 bucket.

Data Warehouse

Data Warehouse Data-driven Testing Business Intelligence

Create an Apache Hudi-based near-real-time transactional data lake using AWS DMS, Amazon Kinesis, AWS Glue streaming ETL, and data visualization using Amazon QuickSight

AWS Big Data

AUGUST 3, 2023

Data analytics on operational data at near-real time is becoming a common need. Due to the exponential growth of data volume, it has become common practice to replace read replicas with data lakes to have better scalability and performance. For more information, see Changing the default settings for your data lake.

Data Lake

Data Lake Visualization Dashboards Insurance

Deriving Value from Data Lakes with AI

Sisense

DECEMBER 23, 2019

AI and ML are the only ways to derive value from massive data lakes, cloud-native data warehouses, and other huge stores of information. The right data and analytics platform can help you bridge the gap between your current AI and analytics paradigm and where you want your company to be in the future. .

Data Lake

Data Lake Machine Learning Data Warehouse Digital Transformation

Data Lakes: What Are They and Who Needs Them?

Jet Global

JULY 2, 2019

To address the flood of data and the needs of enterprise businesses to store, sort, and analyze that data, a new storage solution has evolved: the data lake. What’s in a Data Lake? Data warehouses do a great job of standardizing data from disparate sources for analysis. Taking a Dip.

Data Lake

Data Lake Data Warehouse Big Data Machine Learning

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

AWS Big Data

MARCH 7, 2024

At the same time, they need to optimize operational costs to unlock the value of this data for timely insights and do so with a consistent performance. With this massive data growth, data proliferation across your data stores, data warehouse, and data lakes can become equally challenging.

Data Lake

Data Lake Analytics Dashboards Metrics

Apache Iceberg optimization: Solving the small files problem in Amazon EMR

AWS Big Data

OCTOBER 3, 2023

In our previous post Improve operational efficiencies of Apache Iceberg tables built on Amazon S3 data lakes , we discussed how you can implement solutions to improve operational efficiencies of your Amazon Simple Storage Service (Amazon S3) data lake that is using the Apache Iceberg open table format and running on the Amazon EMR big data platform.

Optimization

Optimization Snapshot Data Lake Metadata

Modernize your data observability with Amazon OpenSearch Service zero-ETL integration with Amazon S3

AWS Big Data

JUNE 5, 2024

The integration is new way for customers to query operational logs in Amazon S3 and Amazon S3-based data lakes without needing to switch between tools to analyze operational data. Amazon S3 is an object storage service offering industry-leading scalability, data availability, security, and performance.

Data Lake

Data Lake Cost-Benefit Dashboards Visualization

Exploring real-time streaming for generative AI Applications

AWS Big Data

MARCH 25, 2024

For example, in a chatbot, data events could pertain to an inventory of flights and hotels or price changes that are constantly ingested to a streaming storage engine. Furthermore, data events are filtered, enriched, and transformed to a consumable format using a stream processor.

Data Lake

Data Lake Unstructured Data Management Modeling

Build an end-to-end serverless streaming pipeline with Apache Kafka on Amazon MSK using Python

AWS Big Data

MARCH 21, 2024

For your application’s low-latency and real-time data access, you can use Lambda and DynamoDB. For longer-term data storage, you can use managed serverless connector service Amazon Data Firehose to send data to your data lake. You can use the same data to train ML models. b64decode(x["value"]).decode('utf-8')

Data Lake

Data Lake Management Modeling Optimization

Detect, mask, and redact PII data using AWS Glue before loading into Amazon OpenSearch Service

AWS Big Data

JANUARY 12, 2024

Ingestion: Data lake batch, micro-batch, and streaming Many organizations land their source data into their data lake in various ways, including batch, micro-batch, and streaming jobs. Amazon AppFlow can be used to transfer data from different SaaS applications to a data lake.

Data Lake

Data Lake Cost-Benefit Visualization Structured Data

Salesforce debuts Zero Copy Partner Network to ease data integration

CIO Business Intelligence

APRIL 25, 2024

For instance, a Data Cloud-triggered flow could update an account manager in Slack when shipments in an external data lake are marked as delayed. Sharing Customer 360 insights back without data replication. With zero-copy support, the insurance company wouldn’t have to load that weather data into their platform.

Data Integration

Data Integration Data Lake Metadata Data Warehouse

Implementing a Pharma Data Mesh using DataOps

DataKitchen

AUGUST 19, 2021

Figure 3 shows an example processing architecture with data flowing in from internal and external sources. Each data source is updated on its own schedule, for example, daily, weekly or monthly. The data scientists and analysts have what they need to build analytics for the user. The new Recipes run, and BOOM! Conclusion.

Data Warehouse

Data Warehouse Data Lake Manufacturing Testing

Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation

AWS Big Data

JANUARY 17, 2024

With Amazon EMR 6.15, we launched AWS Lake Formation based fine-grained access controls (FGAC) on Open Table Formats (OTFs), including Apache Hudi, Apache Iceberg, and Delta lake. Many large enterprise companies seek to use their transactional data lake to gain insights and improve decision-making.

Data Lake

Data Lake Snapshot Big Data Data-driven

Deploy and Optimize Your Snowflake Environment Faster With Accelerators

CDW Research Hub

JULY 18, 2022

One modern data platform solution that provides simplicity and flexibility to grow is Snowflake’s data cloud and platform. These Snowflake accelerators reduce the time to analytics for your users at all levels so you can make data-driven decisions faster. Security Data Lake. Overall data architecture and strategy.

Optimization

Optimization Data Lake Data Warehouse Manufacturing

Porsche Carrera Cup Brasil gets real-time data boost

CIO Business Intelligence

MAY 21, 2024

Unlike many other events, which consist of multiple racing teams and manufacturers, Porsche Carrera Cup Brasil provides and maintains all 75 cars used in the race. If I don’t do predictive maintenance, if I have to do corrective maintenance at events, a lot of money is wasted.”

Broadcasting

Broadcasting Recreation/Entertainment Manufacturing Data Lake

Automated data governance with AWS Glue Data Quality, sensitive data detection, and AWS Lake Formation

AWS Big Data

OCTOBER 10, 2023

Data governance is the process of ensuring the integrity, availability, usability, and security of an organization’s data. Due to the volume, velocity, and variety of data being ingested in data lakes, it can get challenging to develop and maintain policies and procedures to ensure data governance at scale for your data lake.

Data Quality

Data Quality Data Governance Data Lake Testing

DataOps Observability: Taming the Chaos (Part 3)

DataKitchen

NOVEMBER 18, 2022

As he thinks through the various journeys that data take in his company, Jason sees that his dashboard idea would require extracting or testing for events along the way. So, the only way for a data journey to truly observe what’s happening is to get his tools and pipelines to auto-report events. Data and tool tests.

Testing

Testing Statistics Measurement Metrics

How gaming companies can use Amazon Redshift Serverless to build scalable analytical applications faster and easier

AWS Big Data

MARCH 7, 2023

It covers how to use a conceptual, logical architecture for some of the most popular gaming industry use cases like event analysis, in-game purchase recommendations, measuring player satisfaction, telemetry data analysis, and more. A data hub contains data at multiple levels of granularity and is often not integrated.

Analytics

Analytics Data Warehouse Data Lake Metadata

Automate schema evolution at scale with Apache Hudi in AWS Glue

AWS Big Data

FEBRUARY 7, 2023

AWS Glue is a serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, machine learning (ML), and application development. Apache Hudi supports ACID transactions and CRUD operations on a data lake. You don’t alter queries separately in the data lake.

Data Lake

Data Lake Testing Big Data Structured Data

Analyze Elastic IP usage history using Amazon Athena and AWS CloudTrail

AWS Big Data

MAY 15, 2024

The CloudFormation template creates Athena views and a table to search past AssociateAddress events in CloudTrail, an AWS Lambda function to collect snapshots of existing EIPs, and an S3 bucket to store the analysis results. Additionally, you can analyze activity logs with AWS CloudTrail Lake and Amazon Athena.

Snapshot

Snapshot Optimization Data Lake Reporting

Implement alerts in Amazon OpenSearch Service with PagerDuty

AWS Big Data

JUNE 8, 2023

With automated alerting with a third-party service like PagerDuty , an incident management platform, combined with the robust and powerful alerting plugin provided by OpenSearch Service, businesses can proactively manage and respond to critical events. Vivek Shrivastava is a Principal Data Architect, Data Lake in AWS Professional Services.

Data Lake

Data Lake Dashboards Metrics Testing

AWS Lake Formation 2022 year in review

AWS Big Data

JANUARY 31, 2023

We have collected some of the key talks and solutions on data governance, data mesh, and modern data architecture published and presented in AWS re:Invent 2022, and a few data lake solutions built by customers and AWS Partners for easy reference. Starting with Amazon EMR release 6.7.0,

Data Lake

Data Lake Data Governance Data Architecture Machine Learning

How Knowledge Graphs Power Data Mesh and Data Fabric

Ontotext

APRIL 10, 2024

“Any enterprise CEO really ought to be able to ask a question that involves connecting data across the organization, be able to run a company effectively, and especially to be able to respond to unexpected events. Most organizations are missing this ability to connect all the data together.”

Metadata

Metadata Data Lake Data Warehouse Data Quality

Use Amazon Athena with Spark SQL for your open-source transactional table formats

AWS Big Data

JANUARY 24, 2024

AWS-powered data lakes, supported by the unmatched availability of Amazon Simple Storage Service (Amazon S3), can handle the scale, agility, and flexibility required to combine different data and analytics approaches. About the Authors Pathik Shah is a Sr. Analytics Architect on Amazon Athena.

Snapshot

Snapshot Data Lake Metadata Optimization

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

AWS Big Data

SEPTEMBER 13, 2023

A modern data architecture is an evolutionary architecture pattern designed to integrate a data lake, data warehouse, and purpose-built stores with a unified governance model. The company wanted the ability to continue processing operational data in the secondary Region in the rare event of primary Region failure.

Data Lake

Data Lake Data Processing Metadata Snapshot

Hybrid Vs. Multi-Cloud: 5 Key Comparisons in Kafka Architectures

Smart Data Collective

AUGUST 17, 2022

You can safely use an Apache Kafka cluster for seamless data movement from the on-premise hardware solution to the data lake using various cloud services like Amazon’s S3 and others. It will enable you to quickly transform and load the data results into Amazon S3 data lakes or JDBC data stores.

Data Lake

Data Lake Insurance Data-driven Data Processing

Architectural patterns for real-time analytics using Amazon Kinesis Data Streams, part 1

AWS Big Data

JANUARY 8, 2024

It aims to provide a framework to create low-latency streaming applications on the AWS Cloud using Amazon Kinesis Data Streams and AWS purpose-built data analytics services. In this post, we will review the common architectural patterns of two use cases: Time Series Data Analysis and Event Driven Microservices.

Analytics

Analytics IoT Data-driven Snapshot

Moving Enterprise Data From Anywhere to Any System Made Easy

Cloudera

JUNE 2, 2022

Over the last decade, we have often heard about the proliferation of data creating sources (mobile applications, laptops, sensors, enterprise apps) in heterogeneous environments (cloud, on-prem, edge) resulting in the exponential growth of data being created.

Enterprise

Enterprise Data Lake Data Collection Data-driven

Access Amazon Athena in your applications using the WebSocket API

AWS Big Data

MARCH 2, 2023

Many organizations are building data lakes to store and analyze large volumes of structured, semi-structured, and unstructured data. In addition, many teams are moving towards a data mesh architecture, which requires them to expose their data sets as easily consumable data products.

Data Lake

Data Lake Testing Interactive Unstructured Data

Using AWS AppSync and AWS Lake Formation to access a secure data lake through a GraphQL API

AWS Big Data

OCTOBER 9, 2023

Data lakes have been gaining popularity for storing vast amounts of data from diverse sources in a scalable and cost-effective way. As the number of data consumers grows, data lake administrators often need to implement fine-grained access controls for different user profiles.

Data Lake

Data Lake Testing Big Data Management

Real estate CIOs drive deals with data

CIO Business Intelligence

JULY 26, 2023

Finally, the IT team developed a digital market center that offers event management as well as training and education content. We’ve been on a journey for the last six years or so to build out our platforms,” says Cox, noting that Keller Williams uses MLS, demographic, product, insurance, and geospatial data globally to fill its data lake. “We

Data Lake

Data Lake Digital Transformation Machine Learning Data Architecture

Enhance query performance using AWS Glue Data Catalog column-level statistics

AWS Big Data

NOVEMBER 22, 2023

Data lakes are designed for storing vast amounts of raw, unstructured, or semi-structured data at a low cost, and organizations share those datasets across multiple departments and teams. The queries on these large datasets read vast amounts of data and can perform complex join operations on multiple datasets.

Statistics

Statistics Data Lake Optimization Data-driven

Building a vision for real-time artificial intelligence

CIO Business Intelligence

APRIL 12, 2023

Real-time AI brings together streaming data and machine learning algorithms to make fast and automated decisions; examples include recommendations, fraud detection, security monitoring, and chatbots. The underpinning architecture needs to include event-streaming technology, high-performing databases, and machine learning feature stores.

Machine Learning

Machine Learning Cost-Benefit Data-driven Strategy

Reference guide to build inventory management and forecasting solutions on AWS

AWS Big Data

APRIL 11, 2023

By collecting data from store sensors using AWS IoT Core , ingesting it using AWS Lambda to Amazon Aurora Serverless , and transforming it using AWS Glue from a database to an Amazon Simple Storage Service (Amazon S3) data lake, retailers can gain deep insights into their inventory and customer behavior.

Forecasting

Forecasting Management IoT Data-driven

Why the Data Journey Manifesto?

DataKitchen

JUNE 12, 2023

We had been talking about “Agile Analytic Operations,” “DevOps for Data Teams,” and “Lean Manufacturing For Data,” but the concept was hard to get across and communicate. I spent much time de-categorizing DataOps: we are not discussing ETL, Data Lake, or Data Science.

Testing

Testing Data Lake Dashboards Data Science

Break data silos and stream your CDC data with Amazon Redshift streaming and Amazon MSK

AWS Big Data

DECEMBER 13, 2023

Apache Kafka is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications. Debezium MySQL source Kafka Connector reads these change events and emits them to the Kafka topics in Amazon MSK.

Data Warehouse

Data Warehouse Snapshot Data Processing Management

Implement slowly changing dimensions in a data lake using AWS Glue and Delta

AWS Big Data

MARCH 28, 2023

As organizations across the globe are modernizing their data platforms with data lakes on Amazon Simple Storage Service (Amazon S3), handling SCDs in data lakes can be challenging.

Data Lake

Data Lake Testing Snapshot Sales

Aaand the New NiFi Champion is…

Cloudera

JUNE 5, 2023

Cloudera VP of Engineering and NiFi founder Joe Witt, joined by principal committers Mark Payne and Matt Gillman, addressed the global community via a virtual event dubbed “ Meet the Committers.” As part of the event, Cloudera kicked off the “Best in Flow” contest. On the verge of the release of NiFi 2.0,

Testing

Testing Data Lake Data Processing IT

Moving Enterprise Data From Anywhere to Any System Made Easy

CIO Business Intelligence

JULY 13, 2022

Over the last decade, we have often heard about the proliferation of data creating sources (mobile applications, laptops, sensors, enterprise apps) in heterogeneous environments (cloud, on-prem, edge) resulting in the exponential growth of data being created.

Enterprise

Enterprise Data Lake Data Collection Data-driven

Living on the Edge: How to Accelerate Your Business with Real-time Analytics

Cloudera

SEPTEMBER 15, 2021

That’s where you find the ability to empower IoT devices to respond to events in real time by capturing and analyzing the relevant data. Edge computing relies on squeezing the power and functionality of a data center into a micro site as close to data sources as possible to enable real-time tasks.

IoT

IoT Analytics Internet of Things Data Lake

Monitor data pipelines in a serverless data lake

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

Webinars

Trending Sources

Enable business users to analyze large datasets in your data lake with Amazon QuickSight

Webinars

Build a real-time GDPR-aligned Apache Iceberg data lake

Build an ETL process for Amazon Redshift using Amazon S3 Event Notifications and AWS Step Functions

Create an Apache Hudi-based near-real-time transactional data lake using AWS DMS, Amazon Kinesis, AWS Glue streaming ETL, and data visualization using Amazon QuickSight

Deriving Value from Data Lakes with AI

Data Lakes: What Are They and Who Needs Them?

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

Apache Iceberg optimization: Solving the small files problem in Amazon EMR

Modernize your data observability with Amazon OpenSearch Service zero-ETL integration with Amazon S3

Exploring real-time streaming for generative AI Applications

Build an end-to-end serverless streaming pipeline with Apache Kafka on Amazon MSK using Python

Detect, mask, and redact PII data using AWS Glue before loading into Amazon OpenSearch Service

Salesforce debuts Zero Copy Partner Network to ease data integration

Implementing a Pharma Data Mesh using DataOps

Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation

Deploy and Optimize Your Snowflake Environment Faster With Accelerators

Porsche Carrera Cup Brasil gets real-time data boost

Automated data governance with AWS Glue Data Quality, sensitive data detection, and AWS Lake Formation

DataOps Observability: Taming the Chaos (Part 3)

How gaming companies can use Amazon Redshift Serverless to build scalable analytical applications faster and easier

Automate schema evolution at scale with Apache Hudi in AWS Glue

Analyze Elastic IP usage history using Amazon Athena and AWS CloudTrail

Implement alerts in Amazon OpenSearch Service with PagerDuty

AWS Lake Formation 2022 year in review

How Knowledge Graphs Power Data Mesh and Data Fabric

Use Amazon Athena with Spark SQL for your open-source transactional table formats

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

Hybrid Vs. Multi-Cloud: 5 Key Comparisons in Kafka Architectures

Architectural patterns for real-time analytics using Amazon Kinesis Data Streams, part 1

Moving Enterprise Data From Anywhere to Any System Made Easy

Access Amazon Athena in your applications using the WebSocket API

Using AWS AppSync and AWS Lake Formation to access a secure data lake through a GraphQL API

Real estate CIOs drive deals with data

Enhance query performance using AWS Glue Data Catalog column-level statistics

Building a vision for real-time artificial intelligence

Reference guide to build inventory management and forecasting solutions on AWS

Why the Data Journey Manifesto?

Break data silos and stream your CDC data with Amazon Redshift streaming and Amazon MSK

Implement slowly changing dimensions in a data lake using AWS Glue and Delta

Aaand the New NiFi Champion is…

Moving Enterprise Data From Anywhere to Any System Made Easy

Living on the Edge: How to Accelerate Your Business with Real-time Analytics

Stay Connected