Data Leaders Brief

Break data silos and stream your CDC data with Amazon Redshift streaming and Amazon MSK

AWS Big Data

DECEMBER 13, 2023

Data loses value over time. Traditionally, customers used batch-based approaches for data movement from operational systems to analytical systems. A batch-based approach can introduce latency in data movement and reduce the value of data for analytics. You can create materialized views using SQL statements.

Data Warehouse

Data Warehouse Snapshot Data Processing Management

Orchestrate an end-to-end ETL pipeline using Amazon S3, AWS Glue, and Amazon Redshift Serverless with Amazon MWAA

AWS Big Data

APRIL 25, 2024

Amazon Managed Workflows for Apache Airflow (Amazon MWAA) is a managed orchestration service for Apache Airflow that you can use to set up and operate data pipelines in the cloud at scale. Additionally, it enables cost optimization by aligning resources with specific use cases, making sure that expenses are well controlled.

Metadata

Metadata Data Processing Management Testing

Achieve near real time operational analytics using Amazon Aurora PostgreSQL zero-ETL integration with Amazon Redshift

AWS Big Data

APRIL 10, 2024

“Data is at the center of every application, process, and business decision. Customers across industries are becoming more data driven and looking to increase revenue, reduce cost, and optimize their business operations by implementing near real time analytics on transactional data, thereby enhancing agility.

Data Warehouse

Data Warehouse Analytics Metrics Snapshot

Webinars

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

The Key to Sustainable Energy Optimization: A Data-Driven Approach for Manufacturing

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

MORE WEBINARS

Use Apache Iceberg in a data lake to support incremental data processing

AWS Big Data

MARCH 2, 2023

It adds tables to compute engines including Spark, Trino, PrestoDB, Flink, and Hive using a high-performance table format that works just like a SQL table. Iceberg has become very popular for its support for ACID transactions in data lakes and features like schema and partition evolution, time travel, and rollback. AWS Glue 3.0

Data Lake

Data Lake Data Processing Metadata Snapshot

Implement data warehousing solution using dbt on Amazon Redshift

AWS Big Data

NOVEMBER 17, 2023

Amazon Redshift is a cloud data warehousing service that provides high-performance analytical processing based on a massively parallel processing (MPP) architecture. Building and maintaining data pipelines is a common challenge for all enterprises. For more information, refer SQL models.

Snapshot

Snapshot Data Processing Testing Data Warehouse

Create a modern data platform using the Data Build Tool (dbt) in the AWS Cloud

AWS Big Data

NOVEMBER 9, 2023

Building a data platform involves various approaches, each with its unique blend of complexities and solutions. In this post, we delve into a case study for a retail use case, exploring how the Data Build Tool (dbt) was used effectively within an AWS environment to build a high-performing, efficient, and modern data platform.

Data Warehouse

Data Warehouse Testing Data Quality Reporting

Create, train, and deploy Amazon Redshift ML model integrating features from Amazon SageMaker Feature Store

AWS Big Data

OCTOBER 26, 2023

Amazon Redshift is a fast, petabyte-scale, cloud data warehouse that tens of thousands of customers rely on to power their analytics workloads. Amazon Redshift ML makes it easy for SQL users to create, train, and deploy ML models using SQL commands familiar to many roles such as executives, business analysts, and data analysts.

Modeling

Modeling Data Warehouse Machine Learning Testing

Manage your workloads better using Amazon Redshift Workload Management

AWS Big Data

SEPTEMBER 25, 2023

With Amazon Redshift , you can run a complex mix of workloads on your data warehouse, such as frequent data loads running alongside business-critical dashboard queries and complex transformation jobs. We also see more and more data science and machine learning (ML) workloads. ExampleCorp has multiple Redshift clusters.

Management

Management Data Warehouse Metrics Dashboards

Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation

AWS Big Data

JANUARY 17, 2024

This allows you to simplify security and governance over transactional data lakes by providing access controls at table-, column-, and row-level permissions with your Apache Spark jobs. Many large enterprise companies seek to use their transactional data lake to gain insights and improve decision-making.

Data Lake

Data Lake Snapshot Big Data Data-driven

Visualize Amazon DynamoDB insights in Amazon QuickSight using the Amazon Athena DynamoDB connector and AWS Glue

AWS Big Data

NOVEMBER 17, 2023

DynamoDB offers built-in security, continuous backups, automated multi-Region replication, in-memory caching, and data import and export tools. The scalability and flexible data schema of DynamoDB make it well-suited for a variety of use cases. The Athena editor is used to test the connector and perform analysis via SQL queries.

Visualization

Visualization Metadata Testing Internet of Things

Cloudera Operational Database application development concepts

Cloudera

FEBRUARY 9, 2021

Cloudera Operational Database is now available in three different form-factors in Cloudera Data Platform (CDP). . But first, let’s look at the different form factors in which Cloudera Operational Database is available to developers: Public cloud: CDP Data Hub Operational Database template . The DML statements are not standard SQL). .

Interactive

Interactive Machine Learning Data Science IT

2021 Data/AI Salary Survey

O'Reilly on Data

SEPTEMBER 15, 2021

In June 2021, we asked the recipients of our Data & AI Newsletter to respond to a survey about compensation. The average salary for data and AI professionals who responded to the survey was $146,000. The results then provide a place to start thinking about what effect the pandemic had on employment. Executive Summary. Demographics.

Machine Learning

Machine Learning Statistics Reporting Consulting

Near-real-time analytics using Amazon Redshift streaming ingestion with Amazon Kinesis Data Streams and Amazon DynamoDB

AWS Big Data

JULY 27, 2023

Amazon Redshift is a fully managed, scalable cloud data warehouse that accelerates your time to insights with fast, easy, and secure analytics at scale. Tens of thousands of customers rely on Amazon Redshift to analyze exabytes of data and run complex analytical queries, making it the widely used cloud data warehouse.

Data Warehouse

Data Warehouse Analytics Metadata Dashboards

Enable business users to analyze large datasets in your data lake with Amazon QuickSight

AWS Big Data

JUNE 23, 2023

Events and many other security data types are stored in Imperva’s Threat Research Multi-Region data lake. Imperva harnesses data to improve their business outcomes. As part of their solution, they are using Amazon QuickSight to unlock insights from their data. This blog post is co-written with Ori Nakar from Imperva.

Data Lake

Data Lake Cost-Benefit Dashboards Data Warehouse

Simplify and speed up Apache Spark applications on Amazon Redshift data with Amazon Redshift integration for Apache Spark

AWS Big Data

APRIL 20, 2023

Customers use Amazon Redshift to run their business-critical analytics on petabytes of structured and semi-structured data. Apache Spark is a popular framework that you can use to build applications for use cases such as ETL (extract, transform, and load), interactive analytics, and machine learning (ML).

Data Lake

Data Lake Data Warehouse Sales Data-driven

What’s new with Amazon MWAA support for Apache Airflow version 2.4.3

AWS Big Data

MAY 2, 2023

Amazon Managed Workflows for Apache Airflow (Amazon MWAA) is a managed orchestration service for Apache Airflow that makes it simple to set up and operate end-to-end data pipelines in the cloud at scale. New feature: Data-aware scheduling using datasets With the release of Apache Airflow v2.4.0, and v2.2.2).

Testing

Testing Experimentation Management Metadata

Optimization Strategies for Iceberg Tables

Cloudera

FEBRUARY 14, 2024

Introduction Apache Iceberg has recently grown in popularity because it adds data warehouse-like capabilities to your data lake making it easier to analyze all your data — structured and unstructured. You can take advantage of a combination of the strategies provided and adapt them to your particular use cases.

Strategy

Strategy Optimization Snapshot Metadata

Build a transactional data lake using Apache Iceberg, AWS Glue, and cross-account data shares using AWS Lake Formation and Amazon Athena

AWS Big Data

APRIL 24, 2023

Building a data lake on Amazon Simple Storage Service (Amazon S3) provides numerous benefits for an organization. However, many use cases, like performing change data capture (CDC) from an upstream relational database to an Amazon S3-based data lake, require handling data at a record level.

Data Lake

Data Lake Data Governance Cost-Benefit Machine Learning

How to build a decision tree model in IBM Db2

IBM Big Data Hub

APRIL 13, 2023

Someone with the knowledge of SQL and access to a Db2 instance, where the in-database ML feature is enabled, can easily learn to build and use a machine learning model in the database. The machine learning use case I will use a dataset of historical flights in the US. Db2 Warehouse on cloud also supports these ML features.

Modeling

Modeling Statistics Machine Learning Testing

Snowflake and Domino: Better Together

Domino Data Lab

JANUARY 11, 2021

Arming data science teams with the access and capabilities needed to establish a two-way flow of information is one critical challenge many organizations face when it comes to unlocking value from their modeling efforts. Domino Data Lab and Snowflake: Better Together. Introduction.

Recreation/Entertainment

Recreation/Entertainment Data Science Data Warehouse Modeling

Benefits of Enterprise Modeling and Data Intelligence Solutions

erwin

JULY 2, 2020

Users discuss how they are putting erwin’s data modeling, enterprise architecture, business process modeling, and data intelligences solutions to work. IT Central Station members using erwin solutions are realizing the benefits of enterprise modeling and data intelligence. For Matthieu G.,

Enterprise

Enterprise Modeling Metadata Data Governance

Power enterprise-grade Data Vaults with Amazon Redshift – Part 2

AWS Big Data

NOVEMBER 16, 2023

Amazon Redshift is a popular cloud data warehouse, offering a fully managed cloud-based service that seamlessly integrates with an organization’s Amazon Simple Storage Service (Amazon S3) data lake, real-time streams, machine learning (ML) workflows, transactional workflows, and much more—all while providing up to 7.9x

Enterprise

Enterprise Data Warehouse Snapshot Cost-Benefit

Three Things Finance and Accounting Teams Should Know about Power BI and Risk

Jet Global

DECEMBER 8, 2020

Many of us have grown accustomed to software products that can be installed in a few minutes, or in the case of more complex products, perhaps a few hours. Many of us have grown accustomed to software products that can be installed in a few minutes, or in the case of more complex products, perhaps a few hours.

Finance

Finance Risk Reporting Software

Checklist of Data Dashboard for 2021? Definition, Examples & More

FineReport

OCTOBER 25, 2021

Numerous data?Want Want to flee the great tidal wave of data display? Now here comes data dashboard ?Similar What is Data Dashboard?–Definition. The power of BI insight enables any group or organization’s processes, initiatives, and projects to be well shown and measured. Data Dashboard Tool.

Dashboards

Dashboards KPI Visualization Key Performance Indicator

Log Reduction Techniques with CFM

Cloudera

OCTOBER 28, 2020

For example Cyber teams may find more value in logs that outline user behaviour when accessing the data, whilst operational teams may prefer logs that show the spikes in load time throughout the day. Converting flows to records and filtering using SQL. SQL filtering on Records. The overview of both processes are as follows.

Cost-Benefit

Cost-Benefit Metadata Consulting Testing

Teradata Storage Optimization

BizAcuity

APRIL 1, 2023

Introduction Teradata is an integrated platform that provides functionality to store, access, and analyze organizational data on the Cloud as well as On-Premise infrastructure. Teradata is based on a parallel Data Warehouse with shared-nothing architecture. Data is stored in a row-based format.

Optimization

Optimization Data Warehouse Management Modeling

What you need to know about product management for AI

O'Reilly on Data

MARCH 31, 2020

AI products are automated systems that collect and learn from data to make user-facing decisions. All you need to know for now is that machine learning uses statistical techniques to give computer systems the ability to “learn” by being trained on existing data. Why AI software development is different.

Management

Management Machine Learning Experimentation Metrics

Teradata Storage Optimization

BizAcuity

SEPTEMBER 22, 2022

Teradata is an integrated platform that provides functionality to store, access, and analyze organizational data on the Cloud as well as On-Premise infrastructure. Teradata is based on a parallel Data Warehouse with shared-nothing architecture. Data is stored in a row-based format. Data is stored in a row-based format.

Optimization

Optimization Data Warehouse Management Modeling

A new era of SQL-development, fueled by a modern data warehouse

Cloudera

SEPTEMBER 17, 2018

SQL development is not a new concept. However, as the data warehousing world shifts into a fast-paced, digital, and agile era, the demands to quickly generate reports and help guide data-driven decisions are constantly increasing. New data types need to be quickly joined with existing data sets.

Data Warehouse

Data Warehouse Optimization Visualization Unstructured Data

Advanced reporting and analytics for the Post Call Analytics (PCA) solution with Amazon QuickSight

AWS Big Data

JANUARY 27, 2023

In this post, we show you how to use PCA’s data to build automated QuickSight dashboards for advanced analytics to assist in quality assurance (QA) and quality management (QM) processes. You can apply data, agent, call duration, and language filters for targeted search. The graphical and tabular views help accurately analyze the data.

Analytics

Analytics Reporting Dashboards Visualization

Introducing AWS Glue crawler and create table support for Apache Iceberg format

AWS Big Data

AUGUST 16, 2023

Apache Iceberg is an open table format for large datasets in Amazon Simple Storage Service (Amazon S3) and provides fast query performance over large tables, atomic commits, concurrent writes, and SQL-compatible table evolution. Often, customers have used the create table statement on analytics engines such as Athena, AWS Glue, and so on.

Data Lake

Data Lake Metadata Snapshot Management

Bridging the Gap Between IT and Finance to Accelerate Reporting

Jet Global

DECEMBER 16, 2020

IT staff must have the deep expertise required to sort out data storage and access, data integrity, performance, and all of the other technical factors in the reporting and analytics domain. How can these two groups begin to communicate more effectively about financial reporting and analytics? It doesn’t need to be that way.

Finance

Finance Reporting IT Cost-Benefit

Use your corporate identities for analytics with Amazon EMR and AWS IAM Identity Center

AWS Big Data

APRIL 26, 2024

To enable your workforce users for analytics with fine-grained data access controls and audit data access, you might have to create multiple AWS Identity and Access Management (IAM) roles with different data permissions and map the workforce users to one of those roles. We use Okta as the IdP for this demonstration.

Analytics

Analytics Data Lake Management Enterprise

Enrich your customer data with geospatial insights using Amazon Redshift, AWS Data Exchange, and Amazon QuickSight

AWS Big Data

MARCH 18, 2024

It always pays to know more about your customers, and AWS Data Exchange makes it straightforward to use publicly available census data to enrich your customer dataset. The United States Census Bureau conducts the US census every 10 years and gathers household survey data. Subscribe to census data on AWS Data Exchange.

Data Warehouse

Data Warehouse Visualization Snapshot Data-driven

Best practices to implement near-real-time analytics using Amazon Redshift Streaming Ingestion with Amazon MSK

AWS Big Data

MARCH 11, 2024

Amazon Redshift is a fully managed, scalable cloud data warehouse that accelerates your time to insights with fast, straightforward, and secure analytics at scale. Tens of thousands of customers rely on Amazon Redshift to analyze exabytes of data and run complex analytical queries, making it the most widely used cloud data warehouse.

Analytics

Analytics Data Warehouse Optimization Metrics

Improve healthcare services through patient 360: A zero-ETL approach to enable near real-time data analytics

AWS Big Data

MARCH 27, 2024

This includes patient medical history, allergies, immunizations, family disease history, and individuals’ lifestyle data such as workout habits. AWS has invested in a zero-ETL (extract, transform, and load) future so that builders can focus more on creating value from data, instead of having to spend time preparing data for analysis.

Data Analytics

Data Analytics Analytics Data Warehouse Data Lake

Themes and Conferences per Pacoid, Episode 8

Domino Data Lab

APRIL 3, 2019

Paco Nathan ‘s latest column dives into data governance. This month’s article features updates from one of the early data conferences of the year, Strata Data Conference – which was held just last week in San Francisco. In particular, here’s my Strata SF talk “Overview of Data Governance” presented in article form.

Data Governance

Data Governance Machine Learning Metadata Big Data

A Beginner’s Introduction To The Most Common Data Types In Programming

datapine

APRIL 28, 2023

Table of Contents 1) What Are Data Types? 2) The Need For Data Types 3) List Of Common Data Types a) Primitive Data Types b) Composite Data Types c) Abstract Data Types In our digitally-driven age, data is permeating every industry and business function at an increasing pace.

Reporting

Reporting Data-driven Visualization Interactive

Run Spark SQL on Amazon Athena Spark

AWS Big Data

OCTOBER 23, 2023

Athena Spark notebooks support PySpark and notebook magics to allow you to work with Spark SQL. Before you run these workloads, most customers run SQL queries to interactively extract, filter, join, and aggregate data into a shape that can be used for decision-making, model training, or inference.

Data Lake

Data Lake Visualization Optimization Interactive

A side-by-side comparison of Apache Spark and Apache Flink for common streaming use cases

AWS Big Data

JULY 28, 2023

Apache Flink and Apache Spark are both open-source, distributed data processing frameworks used widely for big data processing and analytics. Spark is known for its ease of use, high-level APIs, and the ability to process large amounts of data. Flink also allows seamless transition and switching across these APIs.

Data Processing

Data Processing Big Data Data Quality Technology

Simplify access management with Amazon Redshift and AWS Lake Formation for users in an External Identity Provider

AWS Big Data

FEBRUARY 15, 2024

Many organizations use identity providers (IdPs) to authenticate users, manage their attributes, and group memberships for secure, efficient, and centralized identity management. AWS Lake Formation makes it straightforward to centrally govern, secure, and globally share data for analytics and machine learning (ML).

Management

Management Data Lake Sales Data Warehouse

Getting started guide for near-real time operational analytics using Amazon Aurora zero-ETL integration with Amazon Redshift

AWS Big Data

JUNE 28, 2023

Challenges Customers across industries today are looking to increase revenue and customer engagement by implementing near-real time analytics use cases like personalization strategies, fraud detection, inventory monitoring, and many more. in regions us-east-1 , us-east-2 , us-west-2 , ap-northeast-1 and eu-west-1.

Data Warehouse

Data Warehouse Analytics Metrics Dashboards

Materialized Views in Hive for Iceberg Table Format

Cloudera

FEBRUARY 8, 2024

It brings the reliability and simplicity of SQL tables to big data while enabling engines like Hive, Impala, Spark, Trino, Flink, and Presto to work with the same tables at the same time. Apache Iceberg forms the core foundation for Cloudera’s Open Data Lakehouse with the Cloudera Data Platform (CDP).

Snapshot

Snapshot Metadata Cost-Benefit Data Warehouse

Use the new SQL commands MERGE and QUALIFY to implement and validate change data capture in Amazon Redshift

AWS Big Data

SEPTEMBER 22, 2023

Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. Tens of thousands of customers use Amazon Redshift to process exabytes of data every day to power their analytics workloads. Amazon Redshift has recently added many SQL commands and expressions.

Data Warehouse

Data Warehouse OLAP Risk Online Analytical Processing

Break data silos and stream your CDC data with Amazon Redshift streaming and Amazon MSK

Orchestrate an end-to-end ETL pipeline using Amazon S3, AWS Glue, and Amazon Redshift Serverless with Amazon MWAA

Webinars

Trending Sources

Achieve near real time operational analytics using Amazon Aurora PostgreSQL zero-ETL integration with Amazon Redshift

Webinars

Use Apache Iceberg in a data lake to support incremental data processing

Implement data warehousing solution using dbt on Amazon Redshift

Create a modern data platform using the Data Build Tool (dbt) in the AWS Cloud

Create, train, and deploy Amazon Redshift ML model integrating features from Amazon SageMaker Feature Store

Manage your workloads better using Amazon Redshift Workload Management

Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation

Visualize Amazon DynamoDB insights in Amazon QuickSight using the Amazon Athena DynamoDB connector and AWS Glue

Cloudera Operational Database application development concepts

2021 Data/AI Salary Survey

Near-real-time analytics using Amazon Redshift streaming ingestion with Amazon Kinesis Data Streams and Amazon DynamoDB

Enable business users to analyze large datasets in your data lake with Amazon QuickSight

Simplify and speed up Apache Spark applications on Amazon Redshift data with Amazon Redshift integration for Apache Spark

What’s new with Amazon MWAA support for Apache Airflow version 2.4.3

Optimization Strategies for Iceberg Tables

Build a transactional data lake using Apache Iceberg, AWS Glue, and cross-account data shares using AWS Lake Formation and Amazon Athena

How to build a decision tree model in IBM Db2

Snowflake and Domino: Better Together

Benefits of Enterprise Modeling and Data Intelligence Solutions

Power enterprise-grade Data Vaults with Amazon Redshift – Part 2

Three Things Finance and Accounting Teams Should Know about Power BI and Risk

Checklist of Data Dashboard for 2021? Definition, Examples & More

Log Reduction Techniques with CFM

Teradata Storage Optimization

What you need to know about product management for AI

Teradata Storage Optimization

A new era of SQL-development, fueled by a modern data warehouse

Advanced reporting and analytics for the Post Call Analytics (PCA) solution with Amazon QuickSight

Introducing AWS Glue crawler and create table support for Apache Iceberg format

Bridging the Gap Between IT and Finance to Accelerate Reporting

Use your corporate identities for analytics with Amazon EMR and AWS IAM Identity Center

Enrich your customer data with geospatial insights using Amazon Redshift, AWS Data Exchange, and Amazon QuickSight

Best practices to implement near-real-time analytics using Amazon Redshift Streaming Ingestion with Amazon MSK

Improve healthcare services through patient 360: A zero-ETL approach to enable near real-time data analytics

Themes and Conferences per Pacoid, Episode 8

A Beginner’s Introduction To The Most Common Data Types In Programming

Run Spark SQL on Amazon Athena Spark

A side-by-side comparison of Apache Spark and Apache Flink for common streaming use cases

Simplify access management with Amazon Redshift and AWS Lake Formation for users in an External Identity Provider

Getting started guide for near-real time operational analytics using Amazon Aurora zero-ETL integration with Amazon Redshift

Materialized Views in Hive for Iceberg Table Format

Use the new SQL commands MERGE and QUALIFY to implement and validate change data capture in Amazon Redshift

Stay Connected