Data Leaders Brief

data sql how-to-find-duplicate-values-in-a-sql-table

Use Apache Iceberg in a data lake to support incremental data processing

AWS Big Data

MARCH 2, 2023

Apache Iceberg is an open table format for very large analytic datasets, which captures metadata information on the state of datasets as they evolve and change over time. It adds tables to compute engines including Spark, Trino, PrestoDB, Flink, and Hive using a high-performance table format that works just like a SQL table.

Data Lake

Data Lake Data Processing Metadata Snapshot

Harmonize data using AWS Glue and AWS Lake Formation FindMatches ML to build a customer 360 view

AWS Big Data

JUNE 26, 2023

In today’s digital world, data is generated by a large number of disparate sources and growing at an exponential rate. Companies are faced with the daunting task of ingesting all this data, cleansing it, and using it to provide outstanding customer experience. It’s commonly referred to as a data harmonization or deduplication problem.

Insurance

Insurance Visualization Data Lake Metrics

Join 52,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

The Key to Sustainable Energy Optimization: A Data-Driven Approach for Manufacturing

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

MORE WEBINARS

Trending Sources

The Many Faces of Data Relationships

Sisense

AUGUST 4, 2019

Especially when it comes to the data in your tables. That is unless you understand the different scenarios, their resolutions, and how to build a good relationship with your data. As you know, this data is organized into rows, columns and tables, and it’s also indexed so that you can find what you need quickly and easily.

Testing

Testing Visualization Dashboards Modeling

Webinars

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

The Key to Sustainable Energy Optimization: A Data-Driven Approach for Manufacturing

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

MORE WEBINARS

Splitting Comma-Separated Values In MySQL

Sisense

JANUARY 25, 2020

SQL is one of the analyst’s most powerful tools. In SQL Superstar , we give you actionable advice to help you get the most out of this versatile language and create beautiful, effective queries. In this post, we’ll show how to split our comma-separated string into a table of values for easier analysis in MySQL.

Dashboards

Dashboards IT

Visualize Confluent data in Amazon QuickSight using Amazon Athena

AWS Big Data

MARCH 27, 2023

Businesses are using real-time data streams to gain insights into their company’s performance and make informed, data-driven decisions faster. As real-time data has become essential for businesses, a growing number of companies are adapting their data strategy to focus on data in motion.

Visualization

Visualization Data Lake Interactive Data-driven

Snow Everything About Your Warehouse

Sisense

FEBRUARY 17, 2020

We live in a world of data: there’s more of it than ever before, in a ceaselessly expanding array of forms and locations. Dealing with Data is your window into the ways Data Teams are tackling the challenges of this new world to help their companies and their customers thrive. Step 0: Granting access to the tables.

Dashboards

Dashboards Data Warehouse Metadata Business Intelligence

Set up advanced rules to validate quality of multiple datasets with AWS Glue Data Quality

AWS Big Data

JUNE 6, 2023

Data is the lifeblood of modern businesses. In today’s data-driven world, companies rely on data to make informed decisions, gain a competitive edge, and provide exceptional customer experiences. However, not all data is created equal. AWS Glue Data Quality measures and monitors the quality of your dataset.

Data Quality

Data Quality Data Lake Visualization Data-driven

Build and share a business capability model with Amazon QuickSight

AWS Big Data

JULY 14, 2023

In addition, this tool enhances the discovery and reuse of existing business capabilities, avoids duplication of services, and shortens time-to-market. His first assignment is to assess the bank’s capabilities to offer new financial products to its high-value retail clients. Use case overview Bob is a Senior Enterprise Architect.

Modeling

Modeling Visualization Reporting Measurement

Alation Connected Sheets Brings Trust to Spreadsheets

Alation

NOVEMBER 28, 2022

Alation is excited to unveil Alation Connected Sheets , a new product that brings trusted, fresh data directly to spreadsheet users. Now, “spreadsheet jockeys” can pull the most current, compliant data directly from a range of cloud sources, without having to know SQL or depend on a data team to deliver it.

Descriptive Analytics

Descriptive Analytics Risk Sales Data-driven

Exploring new ETL and ELT capabilities for Amazon Redshift from the AWS Glue Studio visual editor

AWS Big Data

APRIL 20, 2023

In a modern data architecture, unified analytics enable you to access the data you need, whether it’s stored in a data lake or a data warehouse. One of the most common use cases for data preparation on Amazon Redshift is to ingest and transform data from different data stores into an Amazon Redshift data warehouse.

Visualization

Visualization Data Warehouse Big Data Data Lake

Use fuzzy string matching to approximate duplicate records in Amazon Redshift

AWS Big Data

FEBRUARY 8, 2023

Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. Amazon Redshift enables you to run complex SQL analytics at scale and performance on terabytes to petabytes of structured and unstructured data, and make the insights widely available through popular business intelligence (BI) and analytics tools.

Data Quality

Data Quality Testing Data Warehouse Unstructured Data

Memory Optimizations for Analytic Queries in Cloudera Data Warehouse

Cloudera

MARCH 2, 2022

Similarly, with better usage of available memory more users can query the data at any given time, so more people can use the warehouse at the same time. This post explains the novel technique for how Impala, offered within the Cloudera Data Platform (CDP), is now able to get much more mileage out of the memory at its disposal.

Data Warehouse

Data Warehouse Optimization Analytics Sales

Cloudera’s Open Data Lakehouse Supercharged with dbt Core(tm)

Cloudera

OCTOBER 7, 2022

dbt allows data teams to produce trusted data sets for reporting, ML modeling, and operational workflows using SQL, with a simple workflow that follows software engineering best practices like modularity, portability, and continuous integration/continuous development (CI/CD). The Open Data Lakehouse . Introduction.

Data Warehouse

Data Warehouse Data Transformation Testing Data Lake

The Data Scientist’s Guide to the Data Catalog

Alation

JULY 19, 2022

These days, data scientists are in high demand. Across the country, data scientists have an unemployment rate of 2% and command an average salary of nearly $100,000. Obstacles, such as user roles, permissions, and approval request prevent speedy data access. Is this data trustworthy? How do I know it can be trusted?

Metadata

Metadata Data Quality Statistics Data Science

Get Your Analytics Insights Instantly – Without Abandoning Central IT

Cloudera

JANUARY 21, 2021

Do you need faster time to value? While cloud-native, point-solution data warehouse services may serve your immediate business needs, there are dangers to the corporation as a whole when you do your own IT this way. Cloudera Data Warehouse (CDW) is here to save the day! Is there a better option than the obvious ones?

Data Lake

Data Lake Data Warehouse IT Analytics

Exploring the AI and data capabilities of watsonx

IBM Big Data Hub

JULY 17, 2023

IBM software products are embedding watsonx capabilities across digital labor, IT automation, security, sustainability, and application modernization to help unlock new levels of business value for clients. How can you get started today? In this blog, I will cover: What is watsonx.ai? What capabilities are included in watsonx.ai?

Machine Learning

Machine Learning Data Warehouse Modeling Cost-Benefit

How a Discovery Data Warehouse, the next evolution of augmented analytics, accelerates treatments and delivers medicines safely to patients in need

Cloudera

NOVEMBER 25, 2020

However, as they continue finding treatments and understanding the progression of these cancers, they now also need to serve a much higher expectation on delivery to market. I remember Matthew’s face showing mixed feelings when he explained how the pressure grew exponentially overnight. . Challenges Ahead.

Data Warehouse

Data Warehouse Unstructured Data Analytics Visualization

Business Intelligence vs. Reporting: Finding Your Bread and Butter

Jet Global

DECEMBER 2, 2019

Not only will this cost you mountains of wasted time, but you’re also in extreme danger of having the wrong data in front of you or giving it to someone else. Wrong data has a domino of consequences from bad business decisions to unaligned operations and auditing implications. How to Compare Reporting & BI Solutions.

Business Intelligence

Business Intelligence Reporting OLAP Data Warehouse

What is Data Mapping?

Jet Global

FEBRUARY 23, 2024

The quick and dirty definition of data mapping is the process of connecting different types of data from various data sources. Data mapping is a crucial step in data modeling and can help organizations achieve their business goals by enabling data integration, migration, transformation, and quality.

Data Warehouse

Data Warehouse Reporting Data Transformation Sales

Use Apache Iceberg in a data lake to support incremental data processing

Harmonize data using AWS Glue and AWS Lake Formation FindMatches ML to build a customer 360 view

Webinars

Trending Sources

The Many Faces of Data Relationships

Webinars

Splitting Comma-Separated Values In MySQL

Visualize Confluent data in Amazon QuickSight using Amazon Athena

Snow Everything About Your Warehouse

Set up advanced rules to validate quality of multiple datasets with AWS Glue Data Quality

Build and share a business capability model with Amazon QuickSight

Alation Connected Sheets Brings Trust to Spreadsheets

Exploring new ETL and ELT capabilities for Amazon Redshift from the AWS Glue Studio visual editor

­­Use fuzzy string matching to approximate duplicate records in Amazon Redshift

Memory Optimizations for Analytic Queries in Cloudera Data Warehouse

Cloudera’s Open Data Lakehouse Supercharged with dbt Core(tm)

The Data Scientist’s Guide to the Data Catalog

Get Your Analytics Insights Instantly – Without Abandoning Central IT

Exploring the AI and data capabilities of watsonx

How a Discovery Data Warehouse, the next evolution of augmented analytics, accelerates treatments and delivers medicines safely to patients in need

Business Intelligence vs. Reporting: Finding Your Bread and Butter

What is Data Mapping?

Stay Connected

Use fuzzy string matching to approximate duplicate records in Amazon Redshift