Data Leaders Brief

learn-sql insert-multiple-rows

Use Amazon Athena with Spark SQL for your open-source transactional table formats

AWS Big Data

JANUARY 24, 2024

In this post, we show you how to use Spark SQL in Amazon Athena notebooks and work with Iceberg, Hudi, and Delta Lake table formats. We demonstrate common operations such as creating databases and tables, inserting data into the tables, querying data, and looking at snapshots of the tables in Amazon S3 using Spark SQL in Athena.

Snapshot

Snapshot Data Lake Metadata Optimization

Accelerate your data warehouse migration to Amazon Redshift – Part 7

AWS Big Data

OCTOBER 17, 2023

With Amazon Redshift, you can use standard SQL to query data across your data warehouse, operational data stores, and data lake. You could identify the changed rows, perhaps using a filter on update timestamps, or you could modify your extract, transform, and load (ETL) process to write to both the source and target databases.

Data Warehouse

Data Warehouse Data Processing Data Lake Management

Join 52,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

The Key to Sustainable Energy Optimization: A Data-Driven Approach for Manufacturing

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

How To Get Promoted In Product Management

MORE WEBINARS

Trending Sources

Interact with Apache Iceberg tables using Amazon Athena and cross account fine-grained permissions using AWS Lake Formation

AWS Big Data

MARCH 23, 2023

If you are interested in learning more about data mesh architecture, visit Design a data mesh architecture using AWS Lake Formation and AWS Glue. We’ll discuss column filtering to restrict certain rows, filtering to restrict column level access, schema evolution, and time travel. Choose Create new filter. Choose Create filter.

Interactive

Interactive Snapshot Data Lake Software

Webinars

The Key to Sustainable Energy Optimization: A Data-Driven Approach for Manufacturing

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

How To Get Promoted In Product Management

MORE WEBINARS

Generic orchestration framework for data warehousing workloads using Amazon Redshift RSQL

AWS Big Data

APRIL 3, 2023

With Amazon Redshift, you can query data across your data warehouse, operational data stores, and data lake using standard SQL. To learn about new options for database scripting, refer to Accelerate your data warehouse migration to Amazon Redshift – Part 4. The converted scripts run on Amazon Redshift with little to no changes.

Data Warehouse

Data Warehouse Testing Data Lake Data-driven

Announcing zero-ETL integrations with AWS Databases and Amazon Redshift

AWS Big Data

NOVEMBER 28, 2023

Through customer feedback, we learned that lot of undifferentiated time and resources go towards building and managing ETL pipelines between transactional databases and data warehouses. ETL is the process data engineers use to combine data from different sources.

Data Warehouse

Data Warehouse Data-driven Machine Learning B2B

Optimization Strategies for Iceberg Tables

Cloudera

FEBRUARY 14, 2024

Rewrite position delete files Iceberg provides a rewrite position delete files procedure in Spark SQL. In Parquet, typically you want your files to be around 512 MB and row-group sizes to be around 128 MB. Unless aligned properly, a Spark task might touch multiple partitions. See [link] to learn more about Z ordering.

Strategy

Strategy Optimization Snapshot Metadata

Power enterprise-grade Data Vaults with Amazon Redshift – Part 2

AWS Big Data

NOVEMBER 16, 2023

Amazon Redshift is a popular cloud data warehouse, offering a fully managed cloud-based service that seamlessly integrates with an organization’s Amazon Simple Storage Service (Amazon S3) data lake, real-time streams, machine learning (ML) workflows, transactional workflows, and much more—all while providing up to 7.9x

Enterprise

Enterprise Data Warehouse Snapshot Cost-Benefit

Enable business users to analyze large datasets in your data lake with Amazon QuickSight

AWS Big Data

JUNE 23, 2023

Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, straightforward to use, and makes it simple for anyone with SQL skills to quickly analyze large-scale datasets in multiple Regions. The following code provides a SQL example for a view creation.

Data Lake

Data Lake Cost-Benefit Dashboards Data Warehouse

Sentry to Ranger – A concise Guide

Cloudera

NOVEMBER 10, 2021

For big data platforms like Cloudera’s stack that are used by multiple business units with many users, upgrading even minor versions must be a well-planned activity to reduce the impact to users and business. This blog post provides CDH users with a quick overview of Ranger as a Sentry replacement for Hadoop SQL policies in CDP.

Data Lake

Data Lake Management Metadata Modeling

Build a real-time GDPR-aligned Apache Iceberg data lake

AWS Big Data

FEBRUARY 24, 2023

The solution workflow consists of the following steps: Streaming data is generated in JSON format using the KDG template and inserted into Kinesis Data Streams. On the Athena console, create an Athena workgroup named demoworkgroup for SQL queries. To learn more about AWS integrations, refer to Iceberg AWS integrations.

Data Lake

Data Lake Metadata Testing Data Warehouse

How Huron built an Amazon QuickSight Asset Catalogue with AWS CDK Based Deployment Pipeline

AWS Big Data

APRIL 26, 2023

This access to the metadata supports the need for updating visuals based on changes as well as automating row and column level security ensuring customer data is properly governed. Enables a hub-and-spoke model for Data Access in multiple AWS accounts in a data mesh fashion.

Metadata

Metadata Dashboards Visualization Consulting

Navigating Data Entities, BYOD, and Data Lakes in Microsoft Dynamics

Jet Global

SEPTEMBER 4, 2020

The SQL query language used to extract data for reporting could also potentially be used to insert, update, or delete records from the database. Instead of summarizing tens of thousands of rows of data at runtime, reports could be produced very quickly because the heavy lifting had already been done. Data Entities.

Data Lake

Data Lake OLAP Data Warehouse Unstructured Data

Use the new SQL commands MERGE and QUALIFY to implement and validate change data capture in Amazon Redshift

AWS Big Data

SEPTEMBER 22, 2023

Amazon Redshift has added many features to enhance analytical processing like ROLLUP, CUBE and GROUPING SETS , which were demonstrated in the post Simplify Online Analytical Processing (OLAP) queries in Amazon Redshift using new SQL constructs such as ROLLUP, CUBE, and GROUPING SETS.

Data Warehouse

Data Warehouse OLAP Risk Online Analytical Processing

Simplify data loading into Type 2 slowly changing dimensions in Amazon Redshift

AWS Big Data

MARCH 9, 2023

Dimensions provide answers to exploratory business questions by allowing end-users to slice and dice data in a variety of ways using familiar SQL commands. SCD population challenge Populating an SCD dimension table involves merging data from multiple source tables, which are usually normalized.

Slice and Dice

Slice and Dice Data Warehouse Metrics Metadata

Open Data Lakehouse powered by Iceberg for all your Data Warehouse needs

Cloudera

APRIL 3, 2023

The new feature this blog post is aiming to discuss about Iceberg V2 format (version 2), as the Iceberg table specification explains, the V1 format aimed to support large analytic data tables, while V2 aimed to add row level deletes and updates. eliminating the deleted rows from the output).

Data Warehouse

Data Warehouse Snapshot Metadata Cost-Benefit

Load data incrementally from transactional data lakes to data warehouses

AWS Big Data

OCTOBER 19, 2023

We also discuss the benefits of incremental loading and the techniques for implementing the architecture using AWS Glue , which is a serverless, scalable data integration service that helps you discover, prepare, move, and integrate data from multiple sources. AWS Glue supports the Redshift MERGE SQL command within its data integration jobs.

Data Lake

Data Lake Data Warehouse Visualization Snapshot

Improve operational efficiencies of Apache Iceberg tables built on Amazon S3 data lakes

AWS Big Data

MAY 24, 2023

Apache Iceberg is an open table format for large datasets in Amazon Simple Storage Service (Amazon S3) and provides fast query performance over large tables, atomic commits, concurrent writes, and SQL-compatible table evolution. It does this while it scales in the background to handle the increased request rate. and below). and Spark 3.3.1.

Data Lake

Data Lake Snapshot Metadata Optimization

Improve your ETL performance using multiple Redshift warehouses for writes

AWS Big Data

FEBRUARY 19, 2024

In this post, we discuss when you should consider using multiple warehouses to write to the same databases, explain how multi-warehouse writes through data sharing works, and walk you through an example on how to use multiple warehouses to write to the same database. We do not ingest data into the af_customer table.

Cost-Benefit

Cost-Benefit Data Warehouse Marketing Interactive

Getting started with AWS Glue Data Quality from the AWS Glue Data Catalog

AWS Big Data

JUNE 6, 2023

AWS Glue is a serverless data integration service that makes it simple to discover, prepare, and combine data for analytics, machine learning (ML), and application development. You can use AWS Glue to create, run, and monitor data integration and ETL (extract, transform, and load) pipelines and catalog your assets across multiple data stores.

Data Quality

Data Quality Data Lake Data-driven Metrics

Choosing an open table format for your transactional data lake on AWS

AWS Big Data

JUNE 9, 2023

This data is then projected into analytics services such as data warehouses, search systems, stream processors, query editors, notebooks, and machine learning (ML) models through direct access, real-time, and batch workflows.

Data Lake

Data Lake Metadata Optimization Statistics

How Ruparupa gained updated insights with an Amazon S3 data lake, AWS Glue, Apache Hudi, and Amazon QuickSight

AWS Big Data

FEBRUARY 22, 2023

hours just to ingest data, which was hundreds of thousands of rows. The CDC records consist of all inserts, updates, and deletes from the source database. The files are transformed to Parquet columnar format with snappy compression and table partitioning to optimize SQL queries from Athena.

Data Lake

Data Lake Dashboards Cost-Benefit Metadata

10 Keys to a Secure Cloud Data Lakehouse

Cloudera

OCTOBER 25, 2022

The data lakehouse is gaining in popularity because it enables a single platform for all your enterprise data with the flexibility to run any analytic and machine learning (ML) use case. It allows users to rapidly ingest data and run self-service analytics and machine learning.

Data Processing

Data Processing Data Lake Cost-Benefit Risk

Accelerate Amazon Redshift secure data use with Satori – Part 1

AWS Big Data

SEPTEMBER 21, 2023

Amazon Redshift provides numerous features such as role-based access control (RBAC), row-level security (RLS), column-level security (CLS), and dynamic data masking to facilitate the secure use of data. If you’re familiar with SQL Notebooks , you can download this SQL notebook for the demonstration and import it to quickly get started.

Data Warehouse

Data Warehouse Interactive Data Architecture Data Lake

Implement column-level encryption to protect sensitive data in Amazon Redshift with AWS Glue and AWS Lambda user-defined functions

AWS Big Data

APRIL 5, 2023

Amazon Redshift provides role-based access control, row-level security, column-level security, and dynamic data masking, along with other database security features to enable organizations to enforce fine-grained data security. You may optionally choose to test it with your own SQL client or business intelligence tools.

Data Warehouse

Data Warehouse Management Business Intelligence Testing

The Right Tool to Support Your Microsoft Dynamics Migration

Jet Global

JUNE 13, 2022

Cloud enterprise resource planning (ERP) software is ideal for a variety of applications, including managing multiple departments and CRM integration. Like Microsoft’s out-of-the-box offerings, they still contain a similar reporting environment with a short learning curve. Interested in Multiple Companies/Databases Consolidation.

Reporting

Reporting Data Lake Sales Operational Reporting

Use Amazon Athena with Spark SQL for your open-source transactional table formats

Accelerate your data warehouse migration to Amazon Redshift – Part 7

Webinars

Trending Sources

Interact with Apache Iceberg tables using Amazon Athena and cross account fine-grained permissions using AWS Lake Formation

Webinars

Generic orchestration framework for data warehousing workloads using Amazon Redshift RSQL

Announcing zero-ETL integrations with AWS Databases and Amazon Redshift

Optimization Strategies for Iceberg Tables

Power enterprise-grade Data Vaults with Amazon Redshift – Part 2

Enable business users to analyze large datasets in your data lake with Amazon QuickSight

Sentry to Ranger – A concise Guide

Build a real-time GDPR-aligned Apache Iceberg data lake

How Huron built an Amazon QuickSight Asset Catalogue with AWS CDK Based Deployment Pipeline

Navigating Data Entities, BYOD, and Data Lakes in Microsoft Dynamics

Use the new SQL commands MERGE and QUALIFY to implement and validate change data capture in Amazon Redshift

Simplify data loading into Type 2 slowly changing dimensions in Amazon Redshift

Open Data Lakehouse powered by Iceberg for all your Data Warehouse needs

Load data incrementally from transactional data lakes to data warehouses

Improve operational efficiencies of Apache Iceberg tables built on Amazon S3 data lakes

Improve your ETL performance using multiple Redshift warehouses for writes

Getting started with AWS Glue Data Quality from the AWS Glue Data Catalog

Choosing an open table format for your transactional data lake on AWS

How Ruparupa gained updated insights with an Amazon S3 data lake, AWS Glue, Apache Hudi, and Amazon QuickSight

10 Keys to a Secure Cloud Data Lakehouse

Accelerate Amazon Redshift secure data use with Satori – Part 1

Implement column-level encryption to protect sensitive data in Amazon Redshift with AWS Glue and AWS Lambda user-defined functions

The Right Tool to Support Your Microsoft Dynamics Migration

Stay Connected