2023, Data Lake and Data Warehouse

2023

Data Lake

Data Warehouse

Accelerate your data warehouse migration to Amazon Redshift – Part 7

AWS Big Data

OCTOBER 17, 2023

With Amazon Redshift, you can use standard SQL to query data across your data warehouse, operational data stores, and data lake. Migrating a data warehouse can be complex. You have to migrate terabytes or petabytes of data from your legacy system while not disrupting your production workload.

Data Warehouse

Data Warehouse Data Processing Data Lake Management

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

AWS Big Data

MARCH 10, 2023

Since the deluge of big data over a decade ago, many organizations have learned to build applications to process and analyze petabytes of data. Data lakes have served as a central repository to store structured and unstructured data at any scale and in various formats.

Data Lake

Data Lake Sales Data Warehouse Snapshot

Join 52,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Manufacturing Sustainability Surge: Your Guide to Data-Driven Energy Optimization & Decarbonization

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

MORE WEBINARS

Trending Sources

Use Apache Iceberg in a data lake to support incremental data processing

AWS Big Data

MARCH 2, 2023

Iceberg has become very popular for its support for ACID transactions in data lakes and features like schema and partition evolution, time travel, and rollback. and later supports the Apache Iceberg framework for data lakes. AWS Glue 3.0 The following diagram illustrates the solution architecture.

Data Lake

Data Lake Data Processing Metadata Snapshot

Webinars

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Manufacturing Sustainability Surge: Your Guide to Data-Driven Energy Optimization & Decarbonization

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

MORE WEBINARS

Key finding from Forrester’s latest BI research including The Forrester Wave™: Augmented Business Intelligence Platforms, Q2 2023

Boris Evelson

JUNE 14, 2023

No matter what technology foundation you’re using – a data lake, a data warehouse, data fabric, data mesh, etc. – BI applications are where business users consume data and turn it into actionable insights and decisions. The BI market has […]

Business Intelligence

Business Intelligence Data Lake Data Warehouse Data-driven

Simplifying data processing at Capitec with Amazon Redshift integration for Apache Spark

AWS Big Data

NOVEMBER 10, 2023

This integration expands the possibilities for AWS analytics and machine learning (ML) solutions, making the data warehouse accessible to a broader range of applications. Your applications can seamlessly read from and write to your Amazon Redshift data warehouse while maintaining optimal performance and transactional consistency.

Data Processing

Data Processing Data Lake Data Warehouse Optimization

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

AWS Big Data

MARCH 7, 2024

At the same time, they need to optimize operational costs to unlock the value of this data for timely insights and do so with a consistent performance. With this massive data growth, data proliferation across your data stores, data warehouse, and data lakes can become equally challenging.

Data Lake

Data Lake Analytics Dashboards Metrics

Load data incrementally from transactional data lakes to data warehouses

AWS Big Data

OCTOBER 19, 2023

Data lakes and data warehouses are two of the most important data storage and management technologies in a modern data architecture. Data lakes store all of an organization’s data, regardless of its format or structure.

Data Lake

Data Lake Data Warehouse Visualization Snapshot

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics

AWS Big Data

NOVEMBER 20, 2023

Use case A typical workload for AWS Glue for Apache Spark jobs is to load data from a relational database to a data lake with SQL-based transformations. The following is a visual representation of an example job where the number of workers is 10. When the example job ran, the workerUtilization metrics showed the following trend.

Metrics

Metrics Data Lake Cost-Benefit Dashboards

Educating ChatGPT on Data Lakehouse

Cloudera

MARCH 17, 2023

I took the free version of ChatGPT on a test drive (in March 2023) and asked some simple questions on data lakehouse and its components. Hopefully this blog will give ChatGPT an opportunity to learn and correct itself while counting towards my 2023 contribution to social good. I thought this was a fairly comprehensive list.

Unstructured Data

Unstructured Data Data Lake Data Warehouse Machine Learning

Use Amazon Athena with Spark SQL for your open-source transactional table formats

AWS Big Data

JANUARY 24, 2024

AWS-powered data lakes, supported by the unmatched availability of Amazon Simple Storage Service (Amazon S3), can handle the scale, agility, and flexibility required to combine different data and analytics approaches. The output will give a count of the number of data and metadata files deleted.

Snapshot

Snapshot Data Lake Metadata Optimization

Backcountry modernizes for the cloud era

CIO Business Intelligence

APRIL 26, 2022

Backcountry also lacked many core services critical for an online retailer — no CMS, no analytics, no data platform, and no data lake. In recent years, e-commerce platforms have evolved into a combination of cloud, analytics, CX UIs, and data lakes dubbed customer data platforms (CDPs). That’s right.

Data Lake

Data Lake Dashboards Recreation/Entertainment Sales

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics, Part 3: Visualization and trend analysis using Amazon QuickSight

AWS Big Data

MARCH 29, 2024

Looking at the Skewness Job per Job visualization, there was spike on November 1, 2023. You can choose Controls , and change filter conditions based on date time, Region, AWS account ID, AWS Glue job name, job run ID, and the source and sink of the data stores. Let’s drill down into details.

Metrics

Metrics Visualization Dashboards Interactive

Building a vision for real-time artificial intelligence

CIO Business Intelligence

APRIL 12, 2023

Most current data architectures were designed for batch processing with analytics and machine learning models running on data warehouses and data lakes. Wrapping up Moving real-time AI from the edge of innovation to the center of the business will be one of the biggest challenges for organizations in 2023.

Machine Learning

Machine Learning Cost-Benefit Data-driven Strategy

Cybersecurity e NIS2: come si muovono i CIO per dormire sonni (un po’) più tranquilli

CIO Business Intelligence

APRIL 22, 2024

L’ultimo Rapporto Clusit ha contato 2.779 incidenti gravi a livello globale nel 2023 (+12% rispetto al 2022), di cui 310 in Italia, ovvero l’11% del totale mondiale e un incremento addirittura del 65% in un anno. Questa attenzione massima rispecchia la consapevolezza che le cyber-minacce sono sempre più numerose e preoccupanti.

Data Lake

Data Lake Testing Management IoT

Implement slowly changing dimensions in a data lake using AWS Glue and Delta

AWS Big Data

MARCH 28, 2023

In a data warehouse, a dimension is a structure that categorizes facts and measures in order to enable users to answer business questions. As organizations across the globe are modernizing their data platforms with data lakes on Amazon Simple Storage Service (Amazon S3), handling SCDs in data lakes can be challenging.

Data Lake

Data Lake Testing Snapshot Sales

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

AWS Big Data

NOVEMBER 29, 2023

In 2013, Amazon Web Services revolutionized the data warehousing industry by launching Amazon Redshift , the first fully-managed, petabyte-scale, enterprise-grade cloud data warehouse. Amazon Redshift made it simple and cost-effective to efficiently analyze large volumes of data using existing business intelligence tools.

Data Warehouse

Data Warehouse Data Lake Analytics Machine Learning

Implement tag-based access control for your data lake and Amazon Redshift data sharing with AWS Lake Formation

AWS Big Data

JULY 21, 2023

This leads to having data across many instances of data warehouses and data lakes using a modern data architecture in separate AWS accounts. We recently announced the integration of Amazon Redshift data sharing with AWS Lake Formation.

Data Lake

Data Lake Data Warehouse Marketing Management

Improve healthcare services through patient 360: A zero-ETL approach to enable near real-time data analytics

AWS Big Data

MARCH 27, 2024

You can send data from your streaming source to this resource for ingesting the data into a Redshift data warehouse. This will be your online transaction processing (OLTP) data store for transactional data. With continuous innovations added to Amazon Redshift, it is now more than just a data warehouse.

Data Analytics

Data Analytics Analytics Data Warehouse Data Lake

Improve operational efficiencies of Apache Iceberg tables built on Amazon S3 data lakes

AWS Big Data

MAY 24, 2023

When you build your transactional data lake using Apache Iceberg to solve your functional use cases, you need to focus on operational use cases for your S3 data lake to optimize the production environment. availability. parquet") df.sortWithinPartitions("review_date").writeTo("dev.db.amazon_reviews_iceberg").append()

Data Lake

Data Lake Snapshot Metadata Optimization

The year’s top 10 enterprise AI trends — so far

CIO Business Intelligence

SEPTEMBER 21, 2023

To make all this possible, the data had to be collected, processed, and fed into the systems that needed it in a reliable, efficient, scalable, and secure way. Data warehouses then evolved into data lakes, and then data fabrics and other enterprise-wide data architectures.

Enterprise

Enterprise Consulting Modeling Cost-Benefit

Top Opportunities for SAP Partners in 2023

Timo Elliott

NOVEMBER 30, 2022

My role was to talk about the trends and opportunities for 2023, for customers, SAP, and our partners. Because of technology limitations, we have always had to start by ripping information from the business systems and moving it to a different platform—a data warehouse, data lake, data lakehouse, data cloud.

Recreation/Entertainment

Recreation/Entertainment Metadata Data Warehouse Cost-Benefit

Preparing the foundations for Generative AI

CIO Business Intelligence

FEBRUARY 20, 2024

Data also needs to be sorted, annotated and labelled in order to meet the requirements of generative AI. No wonder CIO’s 2023 AI Priorities study found that data integration was the number one concern for IT leaders around generative AI integration, above security and privacy and the user experience. “We

Cost-Benefit

Cost-Benefit Data Lake Data Warehouse Data Integration

Materialized Views in Hive for Iceberg Table Format

Cloudera

FEBRUARY 8, 2024

Cloudera Data Warehouse (CDW) running Hive has previously supported creating materialized views against Hive ACID source tables. release and the matching CDW Private Cloud Data Services release, Hive also supports creating, using, and rebuilding materialized views for Iceberg table format.

Snapshot

Snapshot Metadata Cost-Benefit Data Warehouse

Amazon Kinesis Data Streams: celebrating a decade of real-time data innovation

AWS Big Data

NOVEMBER 14, 2023

However, in many organizations, data is typically spread across a number of different systems such as software as a service (SaaS) applications, operational databases, and data warehouses. Such data silos make it difficult to get unified views of the data in an organization and act in real time to derive the most value.

IoT

IoT Data-driven Data Lake Data Strategy

Getting started with AWS Glue Data Quality from the AWS Glue Data Catalog

AWS Big Data

JUNE 6, 2023

You can use AWS Glue to create, run, and monitor data integration and ETL (extract, transform, and load) pipelines and catalog your assets across multiple data stores. Hundreds of thousands of customers use data lakes for analytics and ML to make data-driven business decisions.

Data Quality

Data Quality Data Lake Data-driven Metrics

Introducing watsonx: The future of AI for business

IBM Big Data Hub

MAY 9, 2023

With watsonx.data , businesses can quickly connect to data, get trusted insights and reduce data warehouse costs. A data store built on open lakehouse architecture, it runs both on premises and across multi-cloud environments. Savings may vary depending on configurations, workloads and vendors.

Data Warehouse

Data Warehouse Cost-Benefit Machine Learning Modeling

Wonderla Holidays goes digital to enhance business and customer fun

CIO Business Intelligence

OCTOBER 18, 2022

One pulse sends 150 bytes of data. So, each band can send out 500KB to 750KB of data. To handle the huge volume of data thus generated, the company is in the process of deploying a data lake, data warehouse, and real-time analytical tools in a hybrid model.

Data Lake

Data Lake Cost-Benefit Digital Transformation Data Warehouse

Implement a serverless CDC process with Apache Iceberg using Amazon DynamoDB and Amazon Athena

AWS Big Data

AUGUST 16, 2023

Iceberg manages large collections of files as tables, and it supports modern analytical data lake operations such as record-level insert, update, delete, and time travel queries. Iceberg also helps guarantee data correctness under concurrent write scenarios. In our query, it corresponds to the time 2023-04-18 21:34:13.970.

Data Lake

Data Lake Metadata Testing Snapshot

Exploring the AI and data capabilities of watsonx

IBM Big Data Hub

JULY 17, 2023

It is supported by querying, governance, and open data formats to access and share data across the hybrid cloud. Through workload optimization across multiple query engines and storage tiers, organizations can reduce data warehouse costs by up to 50 percent.

Machine Learning

Machine Learning Data Warehouse Modeling Cost-Benefit

Modeling, Modernization and Automation

BI-Survey

APRIL 27, 2023

Eckerson Group wrote this report in collaboration with BARC by studying the results of a global survey of 238 data & analytics practitioners and leaders. BARC conducted the survey in December 2022 and January 2023, drawing respondents from companies of various sizes and across various industries.

Modeling

Modeling Data Warehouse Data Quality Business Driver

Tackling AI’s data challenges with IBM databases on AWS

IBM Big Data Hub

MARCH 14, 2024

  Redefining cloud database innovation: IBM and AWS In late 2023, IBM and AWS jointly announced the general availability of Amazon relational database service (RDS) for Db2.  With Db2 Warehouse’s fully managed cloud deployment on AWS, enjoy no overhead, indexing, or tuning and automated maintenance.

Cost-Benefit

Cost-Benefit Metadata Optimization Management

Your guide to AWS Analytics at AWS re:Invent 2023

AWS Big Data

NOVEMBER 13, 2023

2023 AWS Analytics Superheroes We are excited to introduce the 2023 AWS Analytics Superheroes at this year’s re:Invent conference! A shapeshifting guardian and protector of data like Data Lynx? 11:30 AM – 12:30 PM (PDT) Ceasars Forum ANT318 | Accelerate innovation with end-to-end serverless data architecture.

Analytics

Analytics Data Lake Data Warehouse Data-driven

AWS re:Invent 2023 Amazon Redshift Sessions Recap

AWS Big Data

DECEMBER 18, 2023

Amazon Redshift powers data-driven decisions for tens of thousands of customers every day with a fully managed, AI-powered cloud data warehouse, delivering the best price-performance for your analytics workloads.

Data Warehouse

Data Warehouse Machine Learning Data-driven Data Lake

What’s cooking with Amazon Redshift at AWS re:Invent 2023

AWS Big Data

NOVEMBER 15, 2023

Connect with experts, meet with book authors on data warehousing and analytics (at the Meet the Authors event on November 29 and 30, 3:00 PM – 4:00 PM), win prizes, and learn all about the latest innovations from our AWS Analytics services. A shapeshifting guardian and protector of data like Data Lynx?

Data Lake

Data Lake Data Warehouse B2B Deep Learning

Real-time streaming data top picks you cannot miss at AWS re:Invent 2023

AWS Big Data

NOVEMBER 8, 2023

Save the date: AWS re:Invent 2023 is happening from November 27 to December 1 in Las Vegas, and you cannot miss it. In today’s data-driven landscape, the quality of data is the foundation upon which the success of organizations and innovations stands. Reserve your seat now! Register now to secure your spot!

Data-driven

Data-driven Data Lake Machine Learning Cost-Benefit

Start to Monetize Your Data with Data Marketplaces and Data Value Scoring

erwin

JANUARY 10, 2024

The result is an improvement in quality and availability of data such that you can monetize it. Your goal should be enterprise data management and an analytics function that pays for itself, like a self-funding data warehouse, data lake or data mesh. What is data monetization?

Measurement

Measurement Cost-Benefit Data Governance Management

Use the Amazon Redshift Data API to interact with Amazon Redshift Serverless

AWS Big Data

APRIL 28, 2023

Amazon Redshift is a fast, scalable, secure, and fully managed cloud data warehouse that makes it simple and cost-effective to analyze all your data using standard SQL and your existing ETL (extract, transform, and load), business intelligence (BI), and reporting tools. You have to specify –-sql to specify your SQL commands.

Interactive

Interactive Metadata Data Warehouse Data-driven

5 Key Takeaways from Flink Forward 2023

Cloudera

NOVEMBER 27, 2023

Earlier this month (November 6 through 8, 2023) a few hundred Apache Flink enthusiasts descended upon a Hyatt Regency Lake near Seattle for the annual Flink Forward conference. Sign up for a free trial of Cloudera’s NiFi-based DataFlow and walk through use cases like stream filtering and cloud data warehouse ingest.

Data Lake

Data Lake Advertising ROI Data Warehouse

Unleashing the power of Presto: The Uber case study

IBM Big Data Hub

SEPTEMBER 25, 2023

Uber understood that digital superiority required the capture of all their transactional data, not just a sampling. They stood up a file-based data lake alongside their analytical database. Because much of the work done on their data lake is exploratory in nature, many users want to execute untested queries on petabytes of data.

OLAP

OLAP Data Lake Data-driven Snapshot

Do the Benefits of Cloud Outweigh the Costs?

Jet Global

SEPTEMBER 19, 2023

Data Access What insights can we derive from our cloud ERP? What are the best practices for analyzing cloud ERP data? Data Management How do we create a data warehouse or data lake in the cloud using our cloud ERP? How do I access the legacy data from my previous ERP?

Cost-Benefit

Cost-Benefit Data Warehouse Reporting Enterprise

Accelerate your data warehouse migration to Amazon Redshift – Part 7

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

Webinars

Trending Sources

Use Apache Iceberg in a data lake to support incremental data processing

Webinars

Key finding from Forrester’s latest BI research including The Forrester Wave™: Augmented Business Intelligence Platforms, Q2 2023

Simplifying data processing at Capitec with Amazon Redshift integration for Apache Spark

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

Load data incrementally from transactional data lakes to data warehouses

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics

Educating ChatGPT on Data Lakehouse

Use Amazon Athena with Spark SQL for your open-source transactional table formats

Backcountry modernizes for the cloud era

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics, Part 3: Visualization and trend analysis using Amazon QuickSight

Building a vision for real-time artificial intelligence

Cybersecurity e NIS2: come si muovono i CIO per dormire sonni (un po’) più tranquilli

Implement slowly changing dimensions in a data lake using AWS Glue and Delta

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

Implement tag-based access control for your data lake and Amazon Redshift data sharing with AWS Lake Formation

Improve healthcare services through patient 360: A zero-ETL approach to enable near real-time data analytics

Improve operational efficiencies of Apache Iceberg tables built on Amazon S3 data lakes

The year’s top 10 enterprise AI trends — so far

Top Opportunities for SAP Partners in 2023

Preparing the foundations for Generative AI

Materialized Views in Hive for Iceberg Table Format

Amazon Kinesis Data Streams: celebrating a decade of real-time data innovation

Getting started with AWS Glue Data Quality from the AWS Glue Data Catalog

Introducing watsonx: The future of AI for business

Wonderla Holidays goes digital to enhance business and customer fun

Implement a serverless CDC process with Apache Iceberg using Amazon DynamoDB and Amazon Athena

Exploring the AI and data capabilities of watsonx

Modeling, Modernization and Automation

Tackling AI’s data challenges with IBM databases on AWS

Your guide to AWS Analytics at AWS re:Invent 2023

AWS re:Invent 2023 Amazon Redshift Sessions Recap

What’s cooking with Amazon Redshift at AWS re:Invent 2023

Real-time streaming data top picks you cannot miss at AWS re:Invent 2023

Start to Monetize Your Data with Data Marketplaces and Data Value Scoring

Use the Amazon Redshift Data API to interact with Amazon Redshift Serverless

5 Key Takeaways from Flink Forward 2023

Unleashing the power of Presto: The Uber case study

Do the Benefits of Cloud Outweigh the Costs?

Stay Connected