Data Lake, Modeling and Snapshot

Data Lake

Modeling

Snapshot

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

AWS Big Data

MARCH 10, 2023

Since the deluge of big data over a decade ago, many organizations have learned to build applications to process and analyze petabytes of data. Data lakes have served as a central repository to store structured and unstructured data at any scale and in various formats.

Data Lake

Data Lake Sales Data Warehouse Snapshot

Analyze Elastic IP usage history using Amazon Athena and AWS CloudTrail

AWS Big Data

MAY 15, 2024

By extracting detailed information from CloudTrail and querying it using Athena, this solution streamlines the process of data collection, analysis, and reporting of EIP usage within an AWS account. Additionally, you can analyze activity logs with AWS CloudTrail Lake and Amazon Athena.

Snapshot

Snapshot Optimization Data Lake Reporting

Join 52,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

How to Optimize the Developer Experience for Monumental Impact

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

MORE WEBINARS

Trending Sources

Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation

AWS Big Data

JANUARY 17, 2024

With Amazon EMR 6.15, we launched AWS Lake Formation based fine-grained access controls (FGAC) on Open Table Formats (OTFs), including Apache Hudi, Apache Iceberg, and Delta lake. Many large enterprise companies seek to use their transactional data lake to gain insights and improve decision-making.

Data Lake

Data Lake Snapshot Big Data Data-driven

Webinars

How to Optimize the Developer Experience for Monumental Impact

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

MORE WEBINARS

Power enterprise-grade Data Vaults with Amazon Redshift – Part 2

AWS Big Data

NOVEMBER 16, 2023

Amazon Redshift is a popular cloud data warehouse, offering a fully managed cloud-based service that seamlessly integrates with an organization’s Amazon Simple Storage Service (Amazon S3) data lake, real-time streams, machine learning (ML) workflows, transactional workflows, and much more—all while providing up to 7.9x

Enterprise

Enterprise Data Warehouse Snapshot Cost-Benefit

Exploring real-time streaming for generative AI Applications

AWS Big Data

MARCH 25, 2024

Foundation models (FMs) are large machine learning (ML) models trained on a broad spectrum of unlabeled and generalized datasets. This scale and general-purpose adaptability are what makes FMs different from traditional ML models. FMs are multimodal; they work with different data types such as text, video, audio, and images.

Data Lake

Data Lake Unstructured Data Management Modeling

Simplifying data processing at Capitec with Amazon Redshift integration for Apache Spark

AWS Big Data

NOVEMBER 10, 2023

As a result of utilizing the Amazon Redshift integration for Apache Spark, developer productivity increased by a factor of 10, feature generation pipelines were streamlined, and data duplication reduced to zero. These tables are then joined with tables from the Enterprise Data Lake (EDL) at runtime. options(**read_config).option("query",

Data Processing

Data Processing Data Lake Data Warehouse Optimization

Manage your data warehouse cost allocations with Amazon Redshift Serverless tagging

AWS Big Data

MARCH 27, 2023

Developers, data scientists, and analysts can work across databases, data warehouses, and data lakes to build reporting and dashboarding applications, perform real-time analytics, share and collaborate on data, and even build and train machine learning (ML) models with Redshift Serverless.

Data Warehouse

Data Warehouse Management Snapshot Data Lake

Interact with Apache Iceberg tables using Amazon Athena and cross account fine-grained permissions using AWS Lake Formation

AWS Big Data

MARCH 23, 2023

Large organizations often have lines of businesses (LoBs) that operate with autonomy in managing their business data. It makes sharing data across LoBs non-trivial. These organizations have adopted a federated model, with each LoB having the autonomy to make decisions on their data.

Interactive

Interactive Snapshot Data Lake Software

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

AWS Big Data

SEPTEMBER 13, 2023

A modern data architecture is an evolutionary architecture pattern designed to integrate a data lake, data warehouse, and purpose-built stores with a unified governance model. Of those tables, some are larger (such as in terms of record volume) than others, and some are updated more frequently than others.

Data Lake

Data Lake Data Processing Metadata Snapshot

Snowflake and Domino: Better Together

Domino Data Lab

JANUARY 11, 2021

Arming data science teams with the access and capabilities needed to establish a two-way flow of information is one critical challenge many organizations face when it comes to unlocking value from their modeling efforts. Domino Data Lab and Snowflake: Better Together. Writing data from Domino into Snowflake.

Recreation/Entertainment

Recreation/Entertainment Data Science Data Warehouse Modeling

Choosing an open table format for your transactional data lake on AWS

AWS Big Data

JUNE 9, 2023

A modern data architecture enables companies to ingest virtually any type of data through automated pipelines into a data lake, which provides highly durable and cost-effective object storage at petabyte or exabyte scale.

Data Lake

Data Lake Metadata Optimization Statistics

Backtesting index rebalancing arbitrage with Amazon EMR and Apache Iceberg

AWS Big Data

JULY 3, 2023

Terminology Let’s first discuss some of the terminology used in this post: Research data lake on Amazon S3 – A data lake is a large, centralized repository that allows you to manage all your structured and unstructured data at any scale. This is where the tagging feature in Apache Iceberg comes in handy.

Snapshot

Snapshot Data Lake Testing Strategy

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. In this post, we describe Orca’s journey building a transactional data lake using Amazon Simple Storage Service (Amazon S3), Apache Iceberg, and AWS Analytics.

Data Lake

Data Lake Analytics Snapshot Optimization

Dimensional modeling in Amazon Redshift

AWS Big Data

JULY 19, 2023

You can structure your data, measure business processes, and get valuable insights quickly can be done by using a dimensional model. Amazon Redshift provides built-in features to accelerate the process of modeling, orchestrating, and reporting from a dimensional model. Declare the grain of your data.

Modeling

Modeling Sales Data Warehouse Snapshot

How Amazon Devices scaled and optimized real-time demand and supply forecasts using serverless analytics

AWS Big Data

FEBRUARY 1, 2023

With data volumes exhibiting a double-digit percentage growth rate year on year and the COVID pandemic disrupting global logistics in 2021, it became more critical to scale and generate near-real-time data. The response times for these data sources are critical to our key stakeholders.

Optimization

Optimization Forecasting Data Lake Metadata

Five actionable steps to GDPR compliance (Right to be forgotten) with Amazon Redshift

AWS Big Data

JULY 28, 2023

Tagging Consider tagging your Amazon Redshift resources to quickly identify which clusters and snapshots contain the PII data, the owners, the data retention policy, and so on. Redshift resources, such as namespaces, workgroups, snapshots, and clusters can be tagged. Tags provide metadata about resources at a glance.

Snapshot

Snapshot Metadata Measurement Data Warehouse

MLOps and DevOps: Why Data Makes It Different

O'Reilly on Data

OCTOBER 19, 2021

Let’s start by considering the job of a non-ML software engineer: writing traditional software deals with well-defined, narrowly-scoped inputs, which the engineer can exhaustively and cleanly model in the code. Not only is data larger, but models—deep learning models in particular—are much larger than before.

IT Testing Experimentation Software

Estimating Scope 1 Carbon Footprint with Amazon Athena

AWS Big Data

AUGUST 2, 2023

As the only user-provided input is the amount spent on an activity, EIO LCA methods aren’t useful for modeling improved efficiency. The data architecture diagram below shows an example of how you could use AWS services to calculate and visualize an organization’s estimated carbon footprint.

Data Lake

Data Lake Measurement Visualization Data Architecture

Enrich your customer data with geospatial insights using Amazon Redshift, AWS Data Exchange, and Amazon QuickSight

AWS Big Data

MARCH 18, 2024

With these new attributes, you are able to build a segmentation model to identify distinct groups of customers that you can target with personalized messaging. This data is available to subscribe to on AWS Data Exchange—and with data sharing, you don’t need to pay to store a copy of it in your account in order to query it.

Data Warehouse

Data Warehouse Visualization Snapshot Data-driven

Accelerating revenue growth with real-time analytics: Poshmark’s journey

AWS Big Data

MARCH 20, 2023

Personalized recommendations – User behavior based on clickstream events can be captured up to the last second before enriching it for personalization and sending it to the model to predict the recommendations. The data from the S3 data lake is used for batch processing and analytics through Amazon EMR and Amazon Redshift.

Analytics

Analytics Slice and Dice Data Processing Data Lake

5 Reasons to Use Apache Iceberg on Cloudera Data Platform (CDP)

Cloudera

MARCH 23, 2022

For example, a Jupyter notebook in CML, can use Spark or Python framework to directly access an Iceberg table to build a forecast model, while new data is ingested via NiFi flows, and a SQL analyst monitors revenue targets using Data Visualization. Simplify data management . 2: Open formats. Reproducibility for ML Ops.

Metadata

Metadata Data Architecture Machine Learning Cost-Benefit

Interview with Dominic Sartorio, Senior Vice President for Products & Development, Protegrity

Corinium

APRIL 25, 2019

Then when there is a breach, it comes as a shock, “wow, I didn’t even know that application had access to so much sensitive data”. Step One in any data security program should first be to discover and classify datasets that are sensitive, and know where that data is, and understand who really needs it to do their jobs.

Insurance

Insurance Risk IoT Cost-Benefit

How Tricentis unlocks insights across the software development lifecycle at speed and scale using Amazon Redshift

AWS Big Data

MARCH 3, 2023

Every event in the data source can be relevant, and our customers don’t tolerate data loss, poor data quality, or discrepancies between the source and Tricentis Analytics. While aggregating, summarizing, and aligning to a common information model, all transformations must not affect the integrity of data from its source.

Software

Software Data Lake Testing Cost-Benefit

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

AWS Big Data

NOVEMBER 29, 2023

We have seen a strong customer demand to expand its scope to cloud-based data lakes because data lakes are increasingly the enterprise solution for large-scale data initiatives due to their power and capabilities. The team uses dbt-glue to build a transformed gold model optimized for business intelligence (BI).

Data Lake

Data Lake Management Metrics Data Warehouse

Build an Amazon Redshift data warehouse using an Amazon DynamoDB single-table design

AWS Big Data

JUNE 21, 2023

In traditional databases, we would model such applications using a normalized data model (entity-relation diagram). In this post, we walk through the process of exporting data from a DynamoDB table to Amazon Redshift. We discuss data model design for both NoSQL databases and SQL data warehouses.

Data Warehouse

Data Warehouse Data Lake OLAP Cost-Benefit

Simplify external object access in Amazon Redshift using automatic mounting of the AWS Glue Data Catalog

AWS Big Data

JULY 28, 2023

Amazon Redshift now makes it easier for you to run queries in AWS data lakes by automatically mounting the AWS Glue Data Catalog. You no longer have to create an external schema in Amazon Redshift to use the data lake tables cataloged in the Data Catalog.

Data Lake

Data Lake Data Governance Data Warehouse Modeling

Unleashing the power of Presto: The Uber case study

IBM Big Data Hub

SEPTEMBER 25, 2023

Uber’s DNA as an analytics company At its core, Uber’s business model is deceptively simple: connect a customer at point A to their destination at point B. Uber understood that digital superiority required the capture of all their transactional data, not just a sampling. It lands as raw data in HDFS.

OLAP

OLAP Data Lake Data-driven Snapshot

Data Leaders Brief

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

Analyze Elastic IP usage history using Amazon Athena and AWS CloudTrail

Webinars

Trending Sources

Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation

Webinars

Power enterprise-grade Data Vaults with Amazon Redshift – Part 2

Exploring real-time streaming for generative AI Applications

Simplifying data processing at Capitec with Amazon Redshift integration for Apache Spark

Manage your data warehouse cost allocations with Amazon Redshift Serverless tagging

Interact with Apache Iceberg tables using Amazon Athena and cross account fine-grained permissions using AWS Lake Formation

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

Snowflake and Domino: Better Together

Choosing an open table format for your transactional data lake on AWS

Backtesting index rebalancing arbitrage with Amazon EMR and Apache Iceberg

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Dimensional modeling in Amazon Redshift

How Amazon Devices scaled and optimized real-time demand and supply forecasts using serverless analytics

Five actionable steps to GDPR compliance (Right to be forgotten) with Amazon Redshift

MLOps and DevOps: Why Data Makes It Different

Estimating Scope 1 Carbon Footprint with Amazon Athena

Enrich your customer data with geospatial insights using Amazon Redshift, AWS Data Exchange, and Amazon QuickSight

Accelerating revenue growth with real-time analytics: Poshmark’s journey

5 Reasons to Use Apache Iceberg on Cloudera Data Platform (CDP)

Interview with Dominic Sartorio, Senior Vice President for Products & Development, Protegrity

How Tricentis unlocks insights across the software development lifecycle at speed and scale using Amazon Redshift

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

Build an Amazon Redshift data warehouse using an Amazon DynamoDB single-table design

Simplify external object access in Amazon Redshift using automatic mounting of the AWS Glue Data Catalog

Unleashing the power of Presto: The Uber case study

Stay Connected