Data Integration, Data Lake and Optimization

Simplifying data processing at Capitec with Amazon Redshift integration for Apache Spark

AWS Big Data

NOVEMBER 10, 2023

Your applications can seamlessly read from and write to your Amazon Redshift data warehouse while maintaining optimal performance and transactional consistency. Additionally, you’ll benefit from performance improvements through pushdown optimizations, further enhancing the efficiency of your operations. options(**read_config).option("query",

Data Processing

Data Processing Data Lake Data Warehouse Optimization

Detect, mask, and redact PII data using AWS Glue before loading into Amazon OpenSearch Service

AWS Big Data

JANUARY 12, 2024

Leadership and development teams can spend more time optimizing current solutions and even experimenting with new use cases, rather than maintaining the current infrastructure. With the ability to move fast on AWS, you also need to be responsible with the data you’re receiving and processing as you continue to scale.

Data Lake

Data Lake Cost-Benefit Visualization Structured Data

Data replication holds the key to hybrid cloud effectiveness

CIO Business Intelligence

MARCH 18, 2024

A hybrid cloud approach offers a huge swath of benefits for organizations, from a boost in agility and resiliency to eliminating data siloes and optimizing workloads. There’s more to data than just adopting hybrid cloud. That data also only needs to be replicated once and can then subsequently be applied to multiple targets.

Cost-Benefit

Cost-Benefit Data Lake Machine Learning Data Integration

Webinars

Leading the Development of Profitable and Sustainable Products

How To Get Promoted In Product Management

MORE WEBINARS

Avoid generative AI malaise to innovate and build business value

CIO Business Intelligence

APRIL 1, 2024

GenAI requires high-quality data. Ensure that data is cleansed, consistent, and centrally stored, ideally in a data lake. Data preparation, including anonymizing, labeling, and normalizing data across sources, is key.

Data Lake

Data Lake Consulting Uncertainty Risk

Migrate data from Azure Blob Storage to Amazon S3 using AWS Glue

AWS Big Data

OCTOBER 20, 2023

Today, we are pleased to announce new AWS Glue connectors for Azure Blob Storage and Azure Data Lake Storage that allow you to move data bi-directionally between Azure Blob Storage, Azure Data Lake Storage, and Amazon Simple Storage Service (Amazon S3). option("header","true").load("wasbs://yourblob@youraccountname.blob.core.windows.net/loadingtest-input/100mb")

Data Lake

Data Lake Big Data Consulting Data Warehouse

Snowflake: Data Ingestion Using Snowpipe and AWS Glue

BizAcuity

NOVEMBER 22, 2022

This typically requires a data warehouse for analytics needs that is able to ingest and handle real time data of huge volumes. Snowflake is a cloud-native platform that eliminates the need for separate data warehouses, data lakes, and data marts allowing secure data sharing across the organization.

Data Warehouse

Data Warehouse Cost-Benefit Data Lake Internet of Things

Databricks’ new data lakehouse aims at media, entertainment sector

CIO Business Intelligence

APRIL 25, 2022

The data lakehouse is a relatively new data architecture concept, first championed by Cloudera, which offers both storage and analytics capabilities as part of the same solution, in contrast to the concepts for data lake and data warehouse which, respectively, store data in native format, and structured data, often in SQL format.

Recreation/Entertainment

Recreation/Entertainment Data Lake Data Warehouse Unstructured Data

Moving Enterprise Data From Anywhere to Any System Made Easy

Cloudera

JUNE 2, 2022

Over the last decade, we have often heard about the proliferation of data creating sources (mobile applications, laptops, sensors, enterprise apps) in heterogeneous environments (cloud, on-prem, edge) resulting in the exponential growth of data being created.

Enterprise

Enterprise Data Lake Data Collection Data-driven

Snowflake: Data Ingestion Using Snowpipe and AWS Glue

BizAcuity

APRIL 1, 2023

This typically requires a data warehouse for analytics needs that is able to ingest and handle real time data of huge volumes. Snowflake is a cloud-native platform that eliminates the need for separate data warehouses, data lakes, and data marts allowing secure data sharing across the organization.

Data Warehouse

Data Warehouse Cost-Benefit Data Lake Internet of Things

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics

AWS Big Data

NOVEMBER 20, 2023

For any modern data-driven company, having smooth data integration pipelines is crucial. These pipelines pull data from various sources, transform it, and load it into destination systems for analytics and reporting. The end benefit for you is more effective and optimized AWS Glue for Apache Spark workloads.

Metrics

Metrics Data Lake Cost-Benefit Dashboards

A Closer Look at The Next Phase of Cloudera’s Hybrid Data Lakehouse

Cloudera

MARCH 5, 2024

With built-in features like time travel, schema evolution, and streamlined data discovery, Iceberg empowers data teams to enhance data lake management while upholding data integrity.

Snapshot

Snapshot Data Lake Enterprise Data Governance

Moving Enterprise Data From Anywhere to Any System Made Easy

CIO Business Intelligence

JULY 13, 2022

Over the last decade, we have often heard about the proliferation of data creating sources (mobile applications, laptops, sensors, enterprise apps) in heterogeneous environments (cloud, on-prem, edge) resulting in the exponential growth of data being created.

Enterprise

Enterprise Data Lake Data Collection Data-driven

An AI Chat Bot Wrote This Blog Post …

DataKitchen

DECEMBER 9, 2022

Observability in DataOps refers to the ability to monitor and understand the performance and behavior of data-related systems and processes, and to use that information to improve the quality and speed of data-driven decision making. Overall, DataOps observability is an essential component of modern data-driven organizations.

Machine Learning

Machine Learning Data-driven Optimization Modeling

P&G turns to AI to create digital manufacturing of the future

CIO Business Intelligence

OCTOBER 1, 2022

The digital transformation of P&G’s manufacturing platform will enable the company to check product quality in real-time directly on the production line, maximize the resiliency of equipment while avoiding waste, and optimize the use of energy and water in manufacturing plants.

Manufacturing

Manufacturing Digital Transformation IoT Internet of Things

Dive deep into AWS Glue 4.0 for Apache Spark

AWS Big Data

MAY 18, 2023

It’s even harder when your organization is dealing with silos that impede data access across different data stores. Seamless data integration is a key requirement in a modern data architecture to break down data silos. The upgrade also offers support for Bloom filters and skew optimization.

Testing

Testing Data Lake Cost-Benefit Data Integration

Straumann Group is transforming dentistry with data, AI

CIO Business Intelligence

FEBRUARY 16, 2023

Selling the value of data transformation Iyengar and his team are 18 months into a three- to five-year journey that started by building out the data layer — corralling data sources such as ERP, CRM, and legacy databases into data warehouses for structured data and data lakes for unstructured data.

Unstructured Data

Unstructured Data Data Lake Prescriptive Analytics Digital Transformation

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

AWS Big Data

AUGUST 31, 2023

Amazon Redshift enables you to directly access data stored in Amazon Simple Storage Service (Amazon S3) using SQL queries and join data across your data warehouse and data lake. With Amazon Redshift, you can query the data in your S3 data lake using a central AWS Glue metastore from your Redshift data warehouse.

Data Lake

Data Lake Data Warehouse Metadata Data Architecture

AWS Glue Data Quality is Generally Available

AWS Big Data

JUNE 6, 2023

We are excited to announce the General Availability of AWS Glue Data Quality. Our journey started by working backward from our customers who create, manage, and operate data lakes and data warehouses for analytics and machine learning. DeeQu is optimized to run data quality rules in minimal passes that makes it efficient.

Data Quality

Data Quality Statistics Data Lake Visualization

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics, Part 3: Visualization and trend analysis using Amazon QuickSight

AWS Big Data

MARCH 29, 2024

Analyzing historical patterns allows you to optimize performance, identify issues proactively, and improve planning. You can slice data by different dimensions like job name, see anomalies, and share reports securely across your organization.

Metrics

Metrics Visualization Dashboards Interactive

Breaking down Business Intelligence

BizAcuity

MAY 16, 2022

When data is stored in silos and the back-end systems are not able to process the massive amounts of data seamlessly, critical information may be lost. We get critical business insights based on how well we leverage our business data. The more effectively a company uses data, the better it performs. Data Integration.

Business Intelligence

Business Intelligence Data mining Visualization Data Lake

With a zero-ETL approach, AWS is helping builders realize near-real-time analytics

AWS Big Data

JUNE 28, 2023

Another example of AWS’s investment in zero-ETL is providing the ability to query a variety of data sources without having to worry about data movement. Data analysts and data engineers can use familiar SQL commands to join data across several data sources for quick analysis, and store the results in Amazon S3 for subsequent use.

Analytics

Analytics Data Warehouse Data Lake Data-driven

Don’t Fear Artificial Intelligence; Embrace it Through Data Governance

CIO Business Intelligence

APRIL 29, 2022

This would be straightforward task were it not for the fact that, during the digital-era, there has been an explosion of data – collected and stored everywhere – much of it poorly governed, ill-understood, and irrelevant. Further, data management activities don’t end once the AI model has been developed. Addressing the Challenge.

Data Governance

Data Governance IT Risk Data Lake

Data architecture strategy for data quality

IBM Big Data Hub

JANUARY 5, 2023

Next generation of big data platforms and long running batch jobs operated by a central team of data engineers have often led to data lake swamps. Decide on the changes that will optimize your data for achieving your business objectives. Determining what changes must be made to the existing architecture.

Data Quality

Data Quality Data Architecture Strategy Data Lake

Doing Cloud Migration and Data Governance Right the First Time

erwin

OCTOBER 8, 2020

The desire to modernize technology, over time, leads to acquiring many different systems with various data entry points and transformation rules for data as it moves into and across the organization. The metadata-driven suite automatically finds, models, ingests, catalogs and governs cloud data assets.

Data Governance

Data Governance Metadata Testing Data Lake

CIO Ryan Snyder on the benefits of interpreting data as a layer cake

CIO Business Intelligence

AUGUST 2, 2023

Historically, we drove productivity through Lean Six Sigma and optimizing workflows. we’re putting sensors across our manufacturing processes, which give us vast sums of data our leaders use to rethink those processes. But with the advent of Industry 4.0, These aren’t typically business discussions; it’s largely IT.

Manufacturing

Manufacturing Data Architecture Strategy Data Strategy

Get started with AWS Glue Data Quality dynamic rules for ETL pipelines

AWS Big Data

MAY 23, 2024

Hundreds of thousands of organizations build data integration pipelines to extract and transform data. They establish data quality rules to ensure the extracted data is of high quality for accurate business decisions. These rules assess the data based on fixed criteria reflecting current business states.

Data Quality

Data Quality Metrics Data Lake Sales

Extract data from SAP ERP using AWS Glue and the SAP SDK

AWS Big Data

FEBRUARY 8, 2023

Vyaire developed a custom data integration platform, iDataHub, powered by AWS services such as AWS Glue , AWS Lambda , and Amazon API Gateway. In this post, we share how we extracted data from SAP ERP using AWS Glue and the SAP SDK. Prahalathan M is the Data Integration Architect at Vyaire Medical Inc.

Testing

Testing Data Integration Data Lake Enterprise

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. In this post, we describe Orca’s journey building a transactional data lake using Amazon Simple Storage Service (Amazon S3), Apache Iceberg, and AWS Analytics.

Data Lake

Data Lake Analytics Snapshot Optimization

How Amazon Devices scaled and optimized real-time demand and supply forecasts using serverless analytics

AWS Big Data

FEBRUARY 1, 2023

With data volumes exhibiting a double-digit percentage growth rate year on year and the COVID pandemic disrupting global logistics in 2021, it became more critical to scale and generate near-real-time data. This introduces the need for both polling and pushing the data to access and analyze in near-real time.

Optimization

Optimization Forecasting Data Lake Metadata

Week in the Life of an Analyst at Gartner US IT Symposium (virtual) 2021

Andrew White

OCTOBER 22, 2021

Lakehouse (data warehouse and data lake working together) 8. Data Literacy, training, coordination, collaboration 8. Portfolio Planning/Optimization 5. Data Management Infrastructure/Data Fabric 5. Data Integration tactics 4. Digital Business connections to D&A/decision modeling 10.

IT

IT Data Lake Strategy Data Science

Migrate data from Google Cloud Storage to Amazon S3 using AWS Glue

AWS Big Data

JULY 19, 2023

AWS Glue is a serverless data integration service that makes it simple to discover, prepare, and combine data for analytics, machine learning, and application development. Before joining AWS, Qiushuang worked in IT companies such as IBM and Oracle, and accumulated rich practical experience in development and analytics.

Big Data

Big Data Software Consulting Unstructured Data

Five benefits of a data catalog

IBM Big Data Hub

DECEMBER 16, 2022

For example, data catalogs have evolved to deliver governance capabilities like managing data quality and data privacy and compliance. It uses metadata and data management tools to organize all data assets within your organization. Comprehensive search and access to relevant data.

Metadata

Metadata Data Quality Data-driven Data Governance

Introducing Apache Hudi support with AWS Glue crawlers

AWS Big Data

NOVEMBER 22, 2023

Apache Hudi is an open table format that brings database and data warehouse capabilities to data lakes. Apache Hudi helps data engineers manage complex challenges, such as managing continuously evolving datasets with transactions while maintaining query performance. For CoW tables, queries see the latest data committed.

Data Lake

Data Lake Snapshot Metadata Optimization

Understanding Data Entities in Microsoft Dynamics 365

Jet Global

OCTOBER 7, 2020

In the future, customers will be able to deploy Data Entities and replicate transactional tables in an Azure Data Lake. This includes the ability to drill down through live D365 F&SCM data through balances, journal entries, and into subledger transactions to find and fix data integrity and reconciliation issues fast.

Data Warehouse

Data Warehouse OLAP Reporting Finance

Turning the page

Cloudera

JUNE 1, 2021

Our customers must also have secure access to their data from anywhere – from on-premises to hybrid clouds and multiple public clouds. We must integrate and optimize the end-to-end data lifecycle for our customers, empowering them to focus on what really matters – extracting value from their data.

Uncertainty

Uncertainty Cost-Benefit Risk Strategy

How GamesKraft uses Amazon Redshift data sharing to support growing analytics workloads

AWS Big Data

NOVEMBER 13, 2023

Amazon Redshift enables you to use SQL to analyze structured and semi-structured data across data warehouses, operational databases, and data lakes, using AWS-designed hardware and machine learning (ML) to deliver the best price-performance at scale. These upstream data sources constitute the data producer components.

Data Warehouse

Data Warehouse Data Lake Analytics Data Science

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

AWS Big Data

NOVEMBER 29, 2023

Modernizing analytics for scale, performance, and reliability “Our migration from legacy on-premises platform to Amazon Redshift allows us to ingest data 88% faster, query data 3x faster, and load daily data to the cloud 6x faster.

Data Warehouse

Data Warehouse Data Lake Analytics Machine Learning

How Ruparupa gained updated insights with an Amazon S3 data lake, AWS Glue, Apache Hudi, and Amazon QuickSight

AWS Big Data

FEBRUARY 22, 2023

In this post, we show how Ruparupa implemented an incrementally updated data lake to get insights into their business using Amazon Simple Storage Service (Amazon S3), AWS Glue , Apache Hudi , and Amazon QuickSight. An AWS Glue ETL job, using the Apache Hudi connector, updates the S3 data lake hourly with incremental data.

Data Lake

Data Lake Dashboards Cost-Benefit Metadata

How AWS helped Altron Group accelerate their vision for optimized customer engagement

AWS Big Data

JULY 13, 2023

Generating business outcomes In 4 days, the Altron SI team left the Immersion Day workshop with the following: A data pipeline ingesting data from 21 sources (SQL tables and files) and combining them into three mastered and harmonized views that are cataloged for Altron’s B2B accounts.

Optimization

Optimization B2B Data Quality Sales

Compose your ETL jobs for MongoDB Atlas with AWS Glue

AWS Big Data

MAY 3, 2023

In today’s data-driven business environment, organizations face the challenge of efficiently preparing and transforming large amounts of data for analytics and data science purposes. Businesses need to build data warehouses and data lakes based on operational data.

Data Lake

Data Lake Data Warehouse Data-driven Optimization

Top Graph Use Cases and Enterprise Applications (with Real World Examples)

Ontotext

MARCH 8, 2023

Here, I will draw upon our own experience from client projects and lessons learned to provide a selection of optimal use cases for knowledge graphs and semantic solutions along with real world examples of their applications. For many organizations, however, the question remains, “Is it the right solution for us?”

Enterprise

Enterprise Knowledge Discovery Risk Data-driven

Join a streaming data source with CDC data for real-time serverless data analytics using AWS Glue, AWS DMS, and Amazon DynamoDB

AWS Big Data

MAY 30, 2023

Customers have been using data warehousing solutions to perform their traditional analytics tasks. Recently, data lakes have gained lot of traction to become the foundation for analytical solutions, because they come with benefits such as scalability, fault tolerance, and support for structured, semi-structured, and unstructured datasets.

Data Lake

Data Lake Data Analytics Analytics Data Processing

Scale your AWS Glue for Apache Spark jobs with new larger worker types G.4X and G.8X

AWS Big Data

MAY 9, 2023

Hundreds of thousands of customers use AWS Glue , a serverless data integration service, to discover, prepare, and combine data for analytics, machine learning (ML), and application development. AWS Glue for Apache Spark jobs work with your code and configuration of the number of data processing units (DPU).

Data Lake

Data Lake Cost-Benefit Data Integration Data Transformation

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics: Part 2

AWS Big Data

FEBRUARY 13, 2024

Monitoring data pipelines in real time is critical for catching issues early and minimizing disruptions. AWS Glue has made this more straightforward with the launch of AWS Glue job observability metrics , which provide valuable insights into your data integration pipelines built on AWS Glue.

Metrics

Metrics Dashboards Visualization Key Performance Indicator

Simplifying data processing at Capitec with Amazon Redshift integration for Apache Spark

Detect, mask, and redact PII data using AWS Glue before loading into Amazon OpenSearch Service

Webinars

Trending Sources

Data replication holds the key to hybrid cloud effectiveness

Webinars

Avoid generative AI malaise to innovate and build business value

Migrate data from Azure Blob Storage to Amazon S3 using AWS Glue

Snowflake: Data Ingestion Using Snowpipe and AWS Glue

Databricks’ new data lakehouse aims at media, entertainment sector

Moving Enterprise Data From Anywhere to Any System Made Easy

Snowflake: Data Ingestion Using Snowpipe and AWS Glue

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics

A Closer Look at The Next Phase of Cloudera’s Hybrid Data Lakehouse

Moving Enterprise Data From Anywhere to Any System Made Easy

An AI Chat Bot Wrote This Blog Post …

P&G turns to AI to create digital manufacturing of the future

Dive deep into AWS Glue 4.0 for Apache Spark

Straumann Group is transforming dentistry with data, AI

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

AWS Glue Data Quality is Generally Available

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics, Part 3: Visualization and trend analysis using Amazon QuickSight

Breaking down Business Intelligence

With a zero-ETL approach, AWS is helping builders realize near-real-time analytics

Don’t Fear Artificial Intelligence; Embrace it Through Data Governance

Data architecture strategy for data quality

Doing Cloud Migration and Data Governance Right the First Time

CIO Ryan Snyder on the benefits of interpreting data as a layer cake

Get started with AWS Glue Data Quality dynamic rules for ETL pipelines

Extract data from SAP ERP using AWS Glue and the SAP SDK

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

How Amazon Devices scaled and optimized real-time demand and supply forecasts using serverless analytics

Week in the Life of an Analyst at Gartner US IT Symposium (virtual) 2021

Migrate data from Google Cloud Storage to Amazon S3 using AWS Glue

Five benefits of a data catalog

Introducing Apache Hudi support with AWS Glue crawlers

Understanding Data Entities in Microsoft Dynamics 365

Turning the page

How GamesKraft uses Amazon Redshift data sharing to support growing analytics workloads

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

How Ruparupa gained updated insights with an Amazon S3 data lake, AWS Glue, Apache Hudi, and Amazon QuickSight

How AWS helped Altron Group accelerate their vision for optimized customer engagement

Compose your ETL jobs for MongoDB Atlas with AWS Glue

Top Graph Use Cases and Enterprise Applications (with Real World Examples)

Join a streaming data source with CDC data for real-time serverless data analytics using AWS Glue, AWS DMS, and Amazon DynamoDB

Scale your AWS Glue for Apache Spark jobs with new larger worker types G.4X and G.8X

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics: Part 2

Stay Connected