Analytics, Blog, Data Transformation and Data Warehouse

Analytics

Blog

Data Transformation

Data Warehouse

Unlock scalable analytics with AWS Glue and Google BigQuery

AWS Big Data

OCTOBER 27, 2023

Data integration is the foundation of robust data analytics. It encompasses the discovery, preparation, and composition of data from diverse sources. In the modern data landscape, accessing, integrating, and transforming data from diverse sources is a vital process for data-driven decision-making.

Analytics

Analytics Visualization Data Integration Cost-Benefit

The disruptive potential of open data lakehouse architectures and IBM watsonx.data

IBM Big Data Hub

JUNE 15, 2023

The proliferation of data silos also inhibits the unification and enrichment of data which is essential to unlocking the new insights. Moreover, increased regulatory requirements make it harder for enterprises to democratize data access and scale the adoption of analytics and artificial intelligence (AI).

Data Warehouse

Data Warehouse Data Lake Optimization Data-driven

Join 52,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

How To Get Promoted In Product Management

MORE WEBINARS

Trending Sources

Analytics Vidhya

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

AWS Big Data

NOVEMBER 29, 2023

dbt is an open source, SQL-first templating engine that allows you to write repeatable and extensible data transforms in Python and SQL. dbt is predominantly used by data warehouses (such as Amazon Redshift ) customers who are looking to keep their data transform logic separate from storage and engine.

Data Lake

Data Lake Management Metrics Data Warehouse

Webinars

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

How To Get Promoted In Product Management

MORE WEBINARS

How Chime Financial uses AWS to build a serverless stream analytics platform and defeat fraudsters

AWS Big Data

SEPTEMBER 19, 2023

However, our legacy data warehouse-based solution was not equipped for this challenge. However, with a minimum data freshness of 10 minutes, this architecture inherently didn’t align with the near real-time fraud detection use case. He enjoys being at the intersection of big data and programming language theory.

Analytics

Analytics Risk Big Data Machine Learning

The Ten Standard Tools To Develop Data Pipelines In Microsoft Azure

DataKitchen

JULY 27, 2023

Let’s go through the ten Azure data pipeline tools Azure Data Factory : This cloud-based data integration service allows you to create data-driven workflows for orchestrating and automating data movement and transformation. You can use it for big data analytics and machine learning workloads.

Machine Learning

Machine Learning Cost-Benefit Data Transformation Testing

Cloudera’s Open Data Lakehouse Supercharged with dbt Core(tm)

Cloudera

OCTOBER 7, 2022

We’re excited to announce the general availability of the open source adapters for dbt for all the engines in CDP — Apache Hive , Apache Impala , and Apache Spark, with added support for Apache Livy and Cloudera Data Engineering. The Open Data Lakehouse . Cloudera builds dbt adaptors for all engines in the open data lakehouse.

Data Warehouse

Data Warehouse Data Transformation Testing Data Lake

Simplify Metrics on Apache Druid With Rill Data and Cloudera

Cloudera

JULY 21, 2022

As creators and experts in Apache Druid, Rill understands the data store’s importance as the engine for real-time, highly interactive analytics. Cloudera Data Warehouse and Rill Data—built on Apache Hive and Druid, respectively—can be connected using the Hive-Druid Integration. Cloudera Data Warehouse).

Metrics

Metrics Slice and Dice Data Warehouse Dashboards

The Modern Data Stack Explained: What The Future Holds

Alation

JANUARY 17, 2023

The modern data stack is a combination of various software tools used to collect, process, and store data on a well-integrated cloud-based data platform. It is known to have benefits in handling data due to its robustness, speed, and scalability. A typical modern data stack consists of the following: A data warehouse.

Data Warehouse

Data Warehouse Cost-Benefit Data Transformation Data Science

Simplify data transfer: Google BigQuery to Amazon S3 using Amazon AppFlow

AWS Big Data

OCTOBER 5, 2023

Amazon AppFlow , a fully managed data integration service, has been at the forefront of streamlining data transfer between AWS services, software as a service (SaaS) applications, and now Google BigQuery. Architecture Let’s review the architecture to transfer data from Google BigQuery to Amazon S3 using Amazon AppFlow.

Data Warehouse

Data Warehouse Machine Learning Data Integration Data-driven

Happy Birthday, CDP Public Cloud

Cloudera

OCTOBER 13, 2020

In the beginning, CDP ran only on AWS with a set of services that supported a handful of use cases and workload types: CDP Data Warehouse: a kubernetes-based service that allows business analysts to deploy data warehouses with secure, self-service access to enterprise data. That Was Then. New Services.

Data Warehouse

Data Warehouse Machine Learning Visualization Data Lake

What is a DataOps Engineer?

DataKitchen

OCTOBER 5, 2021

A DataOps Engineer owns the assembly line that’s used to build a data and analytic product. We find it helpful to think of data operations as a factory. That’s the state of data analytics today. . Figure 2: Data operations can be conceptualized as a series of automated factory assembly lines.

Testing

Testing Dashboards Measurement Experimentation

How to Use Apache Iceberg in CDP’s Open Lakehouse

Cloudera

AUGUST 8, 2022

The general availability covers Iceberg running within some of the key data services in CDP, including Cloudera Data Warehouse ( CDW ), Cloudera Data Engineering ( CDE ), and Cloudera Machine Learning ( CML ). Cloudera Data Engineering (Spark 3) with Airflow enabled. Cloudera Machine Learning .

Snapshot

Snapshot Data Warehouse Machine Learning Cost-Benefit

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

Cloudera

JUNE 30, 2022

These tools empower analysts and data scientists to easily collaborate on the same data, with their choice of tools and analytic engines. No more lock-in, unnecessary data transformations, or data movement across tools and clouds just to extract insights out of the data.

Data Lake

Data Lake Data Architecture Metadata Data Warehouse

Sirius About Snowflake Demo: How to Create a Reporting Dashboard

CDW Research Hub

OCTOBER 20, 2020

In our latest demo, we highlight how we’re piloting a modern analytic solution using Snowflake’s scalable cloud data warehouse in combination with Matillion and ThoughtSpot, through Snowflake’s Partner Connect service offering. Manageability and use for non-technical users, democratizing data enterprisewide.

Dashboards

Dashboards Reporting Data Warehouse Structured Data

Data platform trinity: Competitive or complementary?

IBM Big Data Hub

JANUARY 18, 2023

They defined it as : “ A data lakehouse is a new, open data management architecture that combines the flexibility, cost-efficiency, and scale of data lakes with the data management and ACID transactions of data warehouses, enabling business intelligence (BI) and machine learning (ML) on all data. ”.

Data Lake

Data Lake Data Warehouse Data-driven Metadata

How to modernize data lakes with a data lakehouse architecture

IBM Big Data Hub

JULY 5, 2023

Data Lakes have been around for well over a decade now, supporting the analytic operations of some of the largest world corporations. This was, without a question, a significant departure from traditional analytic environments, which often meant vendor-lock in and the inability to work with data at scale.

Data Lake

Data Lake Metadata Cost-Benefit Data Warehouse

Exploring the AI and data capabilities of watsonx

IBM Big Data Hub

JULY 17, 2023

In this blog, I will cover: What is watsonx.ai? Capabilities within the Prompt Lab include: Summarize: Transform text with domain-specific content into personalized overviews and capture key points (e.g., It is supported by querying, governance, and open data formats to access and share data across the hybrid cloud.

Machine Learning

Machine Learning Data Warehouse Modeling Cost-Benefit

Biggest Trends in Data Visualization Taking Shape in 2022

Smart Data Collective

OCTOBER 13, 2021

There are countless examples of big data transforming many different industries. There is no disputing the fact that the collection and analysis of massive amounts of unstructured data has been a huge breakthrough. This is something that you can learn more about in just about any technology blog. Prescriptive analytics.

Visualization

Visualization Cost-Benefit Big Data Prescriptive Analytics

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

datapine

SEPTEMBER 29, 2022

6) Data Quality Metrics Examples. 7) Data Quality Control: Use Case. 8) The Consequences Of Bad Data Quality. 9) 3 Sources Of Low-Quality Data. 10) Data Quality Solutions: Key Attributes. Industry-wide, the positive ROI on quality data is well understood. 4 – Data Reporting. With a shocking 2.5

Data Quality

Data Quality Metrics Data-driven Management

Connecting the Data Lifecycle

Cloudera

NOVEMBER 29, 2021

Data transforms businesses. That’s where the data lifecycle comes into play. Managing data and its flow, from the edge to the cloud, is one of the most important tasks in the process of gaining data intelligence. . The firm also worked on creating a solid pipeline from the data warehouse to the data lake.

Data Lake

Data Lake Data Warehouse Data Architecture Reporting

Empower your Jira data in a data lake with Amazon AppFlow and AWS Glue

AWS Big Data

AUGUST 1, 2023

Although Jira Cloud provides reporting capability, loading this data into a data lake will facilitate enrichment with other business data, as well as support the use of business intelligence (BI) tools and artificial intelligence (AI) and machine learning (ML) applications. For InitialRunFlag , choose Setup. Choose Update.

Data Lake

Data Lake Data Transformation Cost-Benefit Data-driven

Prevent Rain Clouds Along Your Snowflake Migration

CDW Research Hub

OCTOBER 25, 2019

As we review data transformation and modernization strategies with our clients, we find many are investigating Snowflake as a data warehouse solution due to its ease of use, speed, and increased flexibility over a traditional data warehouse offering.

Data Warehouse

Data Warehouse Testing Strategy Data-driven

How to use foundation models and trusted governance to manage AI workflow risk

IBM Big Data Hub

OCTOBER 16, 2023

How to scale AL and ML with built-in governance A fit-for-purpose data store built on an open lakehouse architecture allows you to scale AI and ML while providing built-in governance tools. A data store lets a business connect existing data with new data and discover new insights with real-time analytics and business intelligence.

Risk

Risk Modeling Management Metadata

Tackling AI’s data challenges with IBM databases on AWS

IBM Big Data Hub

MARCH 14, 2024

The solution: IBM databases on AWS To solve for these challenges, IBM’s portfolio of SaaS database solutions on Amazon Web Services (AWS), enables enterprises to scale applications, analytics and AI across the hybrid cloud landscape. It enables secure data sharing for analytics and AI across your ecosystem.

Cost-Benefit

Cost-Benefit Metadata Optimization Management

Sisense Q1 2021 Release: Infuse Customized Intelligence at Scale

Sisense

MARCH 30, 2021

In Gartner’s Top 10 Data and Analytics Trends for 2021, trend No. The Sisense Fusion platform embodies this idea, empowering every user to embrace analytics via actionable intelligence delivered directly to them where they spend their time. The Sisense Q1 2021 release is focused on bringing customized analytics to each person.

Dashboards

Dashboards Machine Learning Data Warehouse Marketing

Addressing the Three Scalability Challenges in Modern Data Platforms

Cloudera

NOVEMBER 22, 2021

In legacy analytical systems such as enterprise data warehouses, the scalability challenges of a system were primarily associated with computational scalability, i.e., the ability of a data platform to handle larger volumes of data in an agile and cost-efficient way. Introduction. CRM platforms).

Data Processing

Data Processing Data Warehouse Enterprise Visualization

Alation and dbt Unlock Metadata and Increase Modern Data Stack Visibility

Alation

OCTOBER 18, 2022

Data analysts and engineers use dbt to transform, test, and document data in the cloud data warehouse. Yet every dbt transformation contains vital metadata that is not captured – until now. Data Transformation in the Modern Data Stack. How did the data transform exactly?

Metadata

Metadata Metrics Recreation/Entertainment Data Quality

Fabrics, Meshes & Stacks, oh my! Q&A with Sanjeev Mohan

Alation

AUGUST 11, 2022

The data warehouse and analytical data stores moved to the cloud and disaggregated into the data mesh. Today, the brightest minds in our industry are targeting the massive proliferation of data volumes and the accompanying but hard-to-find value locked within all that data. Why are they so popular?

Metadata

Metadata Data Warehouse Data Quality Data Lake

Transforming Big Data into Actionable Intelligence

Sisense

MARCH 14, 2021

Attempting to learn more about the role of big data (here taken to datasets of high volume, velocity, and variety) within business intelligence today, can sometimes create more confusion than it alleviates, as vital terms are used interchangeably instead of distinctly.

Big Data

Big Data IoT Data Warehouse Data-driven

How Alation’s Data Team Uses the Modern Data Stack to Power Insights

Alation

OCTOBER 27, 2022

Few actors in the modern data stack have inspired the enthusiasm and fervent support as dbt. This data transformation tool enables data analysts and engineers to transform, test and document data in the cloud data warehouse. Bindu Chandramohan, Lead, Data Analytics, Alation : Thanks, Jason!

Metrics

Metrics Dashboards Sales Reporting

Migrate your existing SQL-based ETL workload to an AWS serverless ETL infrastructure using AWS Glue

AWS Big Data

JULY 31, 2023

This solution decouples the ETL and analytics workloads from our transactional data source Amazon Aurora, and uses Amazon Redshift as the data warehouse solution to build a data mart. We use Amazon Redshift as the data warehouse to implement the data mart solution. Choose Confirm.

Sales

Sales Data Warehouse Visualization Testing

Building Better Data Models to Unlock Next-Level Intelligence

Sisense

MAY 11, 2021

You can’t talk about data analytics without talking about data modeling. These two functions are nearly inseparable as we move further into a world of analytics that blends sources of varying volume, variety, veracity, and velocity. But this was only the tip of the analytics iceberg.

Modeling

Modeling Big Data IoT Data Warehouse

Data Mesh 101: How Data Mesh Helps Organizations Be Data-Driven and Achieve Velocity

Ontotext

FEBRUARY 12, 2024

Transferring ownership of data/datasets to domain-specific units that possess a deeper understanding of rules around the data empowers teams, improves data quality and trust, and greatly accelerates the building of data models and analytics.

Data-driven

Data-driven Data Lake Data Quality Business Objectives

Use fuzzy string matching to approximate duplicate records in Amazon Redshift

AWS Big Data

FEBRUARY 8, 2023

Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. Amazon Redshift enables you to run complex SQL analytics at scale and performance on terabytes to petabytes of structured and unstructured data, and make the insights widely available through popular business intelligence (BI) and analytics tools.

Data Quality

Data Quality Testing Data Warehouse Unstructured Data

Automate alerting and reporting for AWS Glue job resource usage

AWS Big Data

MAY 25, 2023

Data transformation plays a pivotal role in providing the necessary data insights for businesses in any organization, small and large. To gain these insights, customers often perform ETL (extract, transform, and load) jobs from their source systems and output an enriched dataset.

Reporting

Reporting Metrics Optimization Data Lake

Best Web Analytics 2.0 Tools: Quantitative, Qualitative, Life Saving!

Occam's Razor

OCTOBER 19, 2010

What is the first thing you want when you think about web analytics? I was reflecting on that recently and thought it was incredible that in all my years of writing this blog I have never written a blog post, not one single one (!!), recommending tools for the complete web analytics 2.0 Web Analytics 2.0.

Analytics

Analytics Testing Measurement Optimization

The Key to Unlocking IT Modernization’s Power? Enterprise level Transformation

Cloudera

APRIL 12, 2021

As Cussatt put it, “data transformation isn’t about the IT, but about enabling the mission to be able to serve the veterans.” Enterprise cloud offerings such as Cloudera Data Warehouse (CDW), a solution to evolving beyond shadow IT, deliver a hybrid cloud, multifunction data platform that centrally integrates information. .

Enterprise

Enterprise IT Digital Transformation Data Warehouse

Save Time and Stress with Dynamics Data Merging from Atlas

Jet Global

MARCH 13, 2024

Between complex data structures, data security questions, and error-prone manual processes, merging data from disparate sources into a single system can quickly turn your routine reporting processes into a stressful and time-consuming ordeal.

Reporting

Reporting Finance Data Quality Sales

Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation

AWS Big Data

JANUARY 17, 2024

Apache Hudi is an open source transactional data lake framework that greatly simplifies incremental data processing and the development of data pipelines. Besides demonstrating with Hudi here, we will follow up with other OTF tables with other blogs. For Stack name , enter a stack name (for example, rsv2-emr-hudi-blog ).

Data Lake

Data Lake Snapshot Big Data Data-driven

Watch the Fifth Video in Our Snowflake Tutorial Series

CDW Research Hub

JULY 23, 2021

Get hands-on experience with the data cloud. Gain experience and understanding of how to drive better business decisions with your data. Our fifth video will demonstrate data transformation and orchestration with Matillion into Snowflake. Learn about current trends.

Structured Data

Structured Data Consulting Strategy Data Strategy

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

AWS Big Data

MARCH 3, 2023

You can then apply transformations and store data in Delta format for managing inserts, updates, and deletes. Amazon EMR Serverless is a serverless option in Amazon EMR that makes it easy for data analysts and engineers to run open-source big data analytics frameworks without configuring, managing, and scaling clusters or servers.

Data Lake

Data Lake Dashboards Metrics Metadata

Unlock scalable analytics with AWS Glue and Google BigQuery

The disruptive potential of open data lakehouse architectures and IBM watsonx.data

Webinars

Trending Sources

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

Webinars

How Chime Financial uses AWS to build a serverless stream analytics platform and defeat fraudsters

The Ten Standard Tools To Develop Data Pipelines In Microsoft Azure

Cloudera’s Open Data Lakehouse Supercharged with dbt Core(tm)

Simplify Metrics on Apache Druid With Rill Data and Cloudera

The Modern Data Stack Explained: What The Future Holds

Simplify data transfer: Google BigQuery to Amazon S3 using Amazon AppFlow

Happy Birthday, CDP Public Cloud

What is a DataOps Engineer?

How to Use Apache Iceberg in CDP’s Open Lakehouse

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

Sirius About Snowflake Demo: How to Create a Reporting Dashboard

Data platform trinity: Competitive or complementary?

How to modernize data lakes with a data lakehouse architecture

Exploring the AI and data capabilities of watsonx

Biggest Trends in Data Visualization Taking Shape in 2022

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

Connecting the Data Lifecycle

Empower your Jira data in a data lake with Amazon AppFlow and AWS Glue

Prevent Rain Clouds Along Your Snowflake Migration

How to use foundation models and trusted governance to manage AI workflow risk

Tackling AI’s data challenges with IBM databases on AWS

Sisense Q1 2021 Release: Infuse Customized Intelligence at Scale

Addressing the Three Scalability Challenges in Modern Data Platforms

Alation and dbt Unlock Metadata and Increase Modern Data Stack Visibility

Fabrics, Meshes & Stacks, oh my! Q&A with Sanjeev Mohan

Transforming Big Data into Actionable Intelligence

How Alation’s Data Team Uses the Modern Data Stack to Power Insights

Migrate your existing SQL-based ETL workload to an AWS serverless ETL infrastructure using AWS Glue

Building Better Data Models to Unlock Next-Level Intelligence

Data Mesh 101: How Data Mesh Helps Organizations Be Data-Driven and Achieve Velocity

­­Use fuzzy string matching to approximate duplicate records in Amazon Redshift

Automate alerting and reporting for AWS Glue job resource usage

Best Web Analytics 2.0 Tools: Quantitative, Qualitative, Life Saving!

The Key to Unlocking IT Modernization’s Power? Enterprise level Transformation

Save Time and Stress with Dynamics Data Merging from Atlas

Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation

Watch the Fifth Video in Our Snowflake Tutorial Series

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

Stay Connected

Use fuzzy string matching to approximate duplicate records in Amazon Redshift