Big Data, Data Integration, Data Lake and Enterprise

Big Data

Data Integration

Data Lake

Enterprise

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

AWS Big Data

APRIL 3, 2024

Businesses are constantly evolving, and data leaders are challenged every day to meet new requirements. For many enterprises and large organizations, it is not feasible to have one processing engine or tool to deal with the various business requirements. This post is co-written with Andries Engelbrecht and Scott Teal from Snowflake.

Data Lake

Data Lake Snapshot Metadata Data Architecture

Migrate an existing data lake to a transactional data lake using Apache Iceberg

AWS Big Data

OCTOBER 3, 2023

A data lake is a centralized repository that you can use to store all your structured and unstructured data at any scale. You can store your data as-is, without having to first structure the data and then run different types of analytics for better business insights.

Data Lake

Data Lake Metadata Snapshot Recreation/Entertainment

Join 52,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

How To Get Promoted In Product Management

MORE WEBINARS

Trending Sources

Analytics Vidhya

Five steps to jumpstart your data integration journey

IBM Big Data Hub

JUNE 26, 2020

Organizations need to collect, organize, and analyze their data across multi-cloud, hybrid cloud, and data lakes. In turn, enterprises are increasingly looking for machine-learning-powered integration tools to synchronize data for analytics, improve employee productivity, and prepare data for analytics.

Data Integration

Data Integration Data Lake Machine Learning Enterprise

Webinars

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

How To Get Promoted In Product Management

MORE WEBINARS

Migrate data from Azure Blob Storage to Amazon S3 using AWS Glue

AWS Big Data

OCTOBER 20, 2023

Today, we are pleased to announce new AWS Glue connectors for Azure Blob Storage and Azure Data Lake Storage that allow you to move data bi-directionally between Azure Blob Storage, Azure Data Lake Storage, and Amazon Simple Storage Service (Amazon S3). option("header","true").load("wasbs://yourblob@youraccountname.blob.core.windows.net/loadingtest-input/100mb")

Data Lake

Data Lake Big Data Consulting Data Warehouse

The Enduring Significance of Data Modeling in the Modern Data-Driven Enterprise

erwin

AUGUST 31, 2023

Q: Is data modeling cool again? In today’s fast-paced digital landscape, data reigns supreme. The data-driven enterprise relies on accurate, accessible, and actionable information to make strategic decisions and drive innovation. Relational databases adapt to handle web-scale data.

Data-driven

Data-driven Modeling Enterprise Structured Data

Automatically detect Personally Identifiable Information in Amazon Redshift using AWS Glue

AWS Big Data

DECEMBER 15, 2023

Even after identification, it’s cumbersome to implement redaction, masking, or encryption of sensitive data at scale. In this post, we provide an automated solution to detect PII data in Amazon Redshift using AWS Glue. For our solution, we use Amazon Redshift to store the data.

Data Lake

Data Lake Data Warehouse Big Data Structured Data

Data Management Requirements for the Enterprise Data Lake

In(tegrate) the Clouds

MAY 1, 2016

SnapLogic published Eight Data Management Requirements for the Enterprise Data Lake. They are: Storage and Data Formats. The company also recently hosted a webinar on Democratizing the Data Lake with Constellation Research and published 2 whitepapers from Mark Madsen. Ingest and Delivery.

Data Lake

Data Lake Enterprise Management Metadata

Data governance in the age of generative AI

AWS Big Data

FEBRUARY 29, 2024

Data is your generative AI differentiator, and a successful generative AI implementation depends on a robust data strategy incorporating a comprehensive data governance approach. Data governance is a critical building block across all these approaches, and we see two emerging areas of focus.

Data Governance

Data Governance Unstructured Data Metadata Data Lake

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics: Part 2

AWS Big Data

FEBRUARY 13, 2024

Monitoring data pipelines in real time is critical for catching issues early and minimizing disruptions. AWS Glue has made this more straightforward with the launch of AWS Glue job observability metrics , which provide valuable insights into your data integration pipelines built on AWS Glue.

Metrics

Metrics Dashboards Visualization Key Performance Indicator

Introducing native support for Apache Hudi, Delta Lake, and Apache Iceberg on AWS Glue for Apache Spark, Part 2: AWS Glue Studio Visual Editor

AWS Big Data

MARCH 20, 2023

In the first post of this series , we described how AWS Glue for Apache Spark works with Apache Hudi, Linux Foundation Delta Lake, and Apache Iceberg datasets tables using the native support of those data lake formats. Even without prior experience using Hudi, Delta Lake or Iceberg, you can easily achieve typical use cases.

Visualization

Visualization Data Lake Snapshot Big Data

What is Data Pipeline? A Detailed Explanation

Smart Data Collective

OCTOBER 17, 2022

Big data is shaping our world in countless ways. Data powers everything we do. Exactly why, the systems have to ensure adequate, accurate and most importantly, consistent data flow between different systems. A point of data entry in a given pipeline. Data Pipeline: Use Cases. The post What is Data Pipeline?

Data Warehouse

Data Warehouse Data Lake Visualization Big Data

Detect, mask, and redact PII data using AWS Glue before loading into Amazon OpenSearch Service

AWS Big Data

JANUARY 12, 2024

Ingestion: Data lake batch, micro-batch, and streaming Many organizations land their source data into their data lake in various ways, including batch, micro-batch, and streaming jobs. Amazon AppFlow can be used to transfer data from different SaaS applications to a data lake.

Data Lake

Data Lake Cost-Benefit Visualization Structured Data

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

AWS Big Data

DECEMBER 13, 2023

In addition to using native managed AWS services that BMS didn’t need to worry about upgrading, BMS was looking to offer an ETL service to non-technical business users that could visually compose data transformation workflows and seamlessly run them on the AWS Glue Apache Spark-based serverless data integration engine.

Metadata

Metadata Data Lake Visualization Data Transformation

Compose your ETL jobs for MongoDB Atlas with AWS Glue

AWS Big Data

MAY 3, 2023

In today’s data-driven business environment, organizations face the challenge of efficiently preparing and transforming large amounts of data for analytics and data science purposes. Businesses need to build data warehouses and data lakes based on operational data.

Data Lake

Data Lake Data Warehouse Data-driven Optimization

What is a Data Pipeline?

Jet Global

MAY 9, 2024

The key components of a data pipeline are typically: Data Sources : The origin of the data, such as a relational database , data warehouse, data lake , file, API, or other data store. This can include tasks such as data ingestion, cleansing, filtering, aggregation, or standardization.

Data Lake

Data Lake Data Warehouse Business Intelligence Machine Learning

Simplifying data processing at Capitec with Amazon Redshift integration for Apache Spark

AWS Big Data

NOVEMBER 10, 2023

The data sourcing problem To ensure the reliability of PySpark data pipelines, it’s essential to have consistent record-level data from both dimensional and fact tables stored in the Enterprise Data Warehouse (EDW). These tables are then joined with tables from the Enterprise Data Lake (EDL) at runtime.

Data Processing

Data Processing Data Lake Data Warehouse Optimization

Your guide to AWS Analytics at AWS re:Invent 2023

AWS Big Data

NOVEMBER 13, 2023

KEY003 | Swami Sivasubramanian (Vice President, Data and AI at AWS) | Nov. 29 | 8:30 AM – 10:30 AM (PDT) A powerful relationship between humans, data, and AI is unfolding right before us. 11:30 AM – 12:30 PM (PDT) Ceasars Forum ANT318 | Accelerate innovation with end-to-end serverless data architecture.

Analytics

Analytics Data Lake Data Warehouse Data-driven

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics

AWS Big Data

NOVEMBER 20, 2023

For any modern data-driven company, having smooth data integration pipelines is crucial. These pipelines pull data from various sources, transform it, and load it into destination systems for analytics and reporting. About the Authors Noritaka Sekiyama is a Principal Big Data Architect on the AWS Glue team.

Metrics

Metrics Data Lake Cost-Benefit Dashboards

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

AWS Big Data

NOVEMBER 29, 2023

We have seen a strong customer demand to expand its scope to cloud-based data lakes because data lakes are increasingly the enterprise solution for large-scale data initiatives due to their power and capabilities. To report issues or request a feature enhancement, feel free to open an issue on GitHub.

Data Lake

Data Lake Management Metrics Data Warehouse

The Ten Standard Tools To Develop Data Pipelines In Microsoft Azure

DataKitchen

JULY 27, 2023

Let’s go through the ten Azure data pipeline tools Azure Data Factory : This cloud-based data integration service allows you to create data-driven workflows for orchestrating and automating data movement and transformation. You can use it for big data analytics and machine learning workloads.

Machine Learning

Machine Learning Cost-Benefit Data Transformation Testing

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

AWS Big Data

NOVEMBER 29, 2023

In 2013, Amazon Web Services revolutionized the data warehousing industry by launching Amazon Redshift , the first fully-managed, petabyte-scale, enterprise-grade cloud data warehouse. Amazon Redshift made it simple and cost-effective to efficiently analyze large volumes of data using existing business intelligence tools.

Data Warehouse

Data Warehouse Data Lake Analytics Machine Learning

How Tricentis unlocks insights across the software development lifecycle at speed and scale using Amazon Redshift

AWS Big Data

MARCH 3, 2023

Tricentis is the global leader in continuous testing for DevOps, cloud, and enterprise applications. Finally, data integrity is of paramount importance. Every event in the data source can be relevant, and our customers don’t tolerate data loss, poor data quality, or discrepancies between the source and Tricentis Analytics.

Software

Software Data Lake Testing Cost-Benefit

Improve healthcare services through patient 360: A zero-ETL approach to enable near real-time data analytics

AWS Big Data

MARCH 27, 2024

AWS has invested in a zero-ETL (extract, transform, and load) future so that builders can focus more on creating value from data, instead of having to spend time preparing data for analysis. This means you no longer have to create an external schema in Amazon Redshift to use the data lake tables cataloged in the Data Catalog.

Data Analytics

Data Analytics Analytics Data Warehouse Data Lake

Prepare and load Amazon S3 data into Teradata using AWS Glue through its native connector for Teradata Vantage

AWS Big Data

NOVEMBER 30, 2023

In this post, we explore how to use the AWS Glue native connector for Teradata Vantage to streamline data integrations and unlock the full potential of your data. Businesses often rely on Amazon Simple Storage Service (Amazon S3) for storing large amounts of data from various data sources in a cost-effective and secure manner.

IT Visualization Machine Learning Data Integration

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics, Part 3: Visualization and trend analysis using Amazon QuickSight

AWS Big Data

MARCH 29, 2024

You can slice data by different dimensions like job name, see anomalies, and share reports securely across your organization. With these insights, teams have the visibility to make data integration pipelines more efficient. About the Authors Noritaka Sekiyama is a Principal Big Data Architect on the AWS Glue team.

Metrics

Metrics Visualization Dashboards Interactive

Unlocking the value of data as your differentiator

AWS Big Data

NOVEMBER 29, 2023

Knowledge Bases works with popular vector databases and engines including Amazon OpenSearch Serverless , Redis Enterprise Cloud, and Pinecone, with support for Amazon Aurora and MongoDB coming soon. You also need services to store data for analysis and machine learning (ML) like Amazon Simple Storage Service (Amazon S3).

Data Warehouse

Data Warehouse Data Lake Data Integration Dashboards

Connect your data for faster decisions with AWS

AWS Big Data

NOVEMBER 7, 2023

Third, AWS continues adding support for more data sources including connections to software as a service (SaaS) applications, on-premises applications, and other clouds so organizations can act on their data. Visit Data integration with AWS to learn more.

Dashboards

Dashboards Data-driven Data Integration Data Lake

AWS Glue Data Quality is Generally Available

AWS Big Data

JUNE 6, 2023

We are excited to announce the General Availability of AWS Glue Data Quality. Our journey started by working backward from our customers who create, manage, and operate data lakes and data warehouses for analytics and machine learning. DeeQu is optimized to run data quality rules in minimal passes that makes it efficient.

Data Quality

Data Quality Statistics Data Lake Visualization

Create an end-to-end data strategy for Customer 360 on AWS

AWS Big Data

MARCH 26, 2024

Data ingestion You have to build ingestion pipelines based on factors like types of data sources (on-premises data stores, files, SaaS applications, third-party data), and flow of data (unbounded streams or batch data). Data exploration Data exploration helps unearth inconsistencies, outliers, or errors.

Data Strategy

Data Strategy Strategy Data Warehouse Prescriptive Analytics

Data architecture strategy for data quality

IBM Big Data Hub

JANUARY 5, 2023

Ill-timed business decisions and misinformed business processes, missed revenue opportunities, failed business initiatives and complex data systems can all stem from data quality issues. Several factors determine the quality of your enterprise data like accuracy, completeness, consistency, to name a few.

Data Quality

Data Quality Data Architecture Strategy Data Lake

TIBCO JasperSoft for BI and Reporting

BizAcuity

AUGUST 1, 2022

TIBCO Jaspersoft offers a complete BI suite that includes reporting, online analytical processing (OLAP), visual analytics , and data integration. The web-scale platform enables users to share interactive dashboards and data from a single page with individuals across the enterprise. Data Security. Source: [link] ].

Reporting

Reporting OLAP Online Analytical Processing Dashboards

Data Management Challenges for the Modern Enterprise

Data Virtualization

MARCH 3, 2021

To remain competitive, organizations must have a data management strategy in place to effectively ingest, store, organize, and analyze data while ensuring that it is. The post Data Management Challenges for the Modern Enterprise appeared first on Data Virtualization blog.

Enterprise

Enterprise Management Strategy Data Lake

Migrate data from Google Cloud Storage to Amazon S3 using AWS Glue

AWS Big Data

JULY 19, 2023

AWS Glue is a serverless data integration service that makes it simple to discover, prepare, and combine data for analytics, machine learning, and application development. Noritaka Sekiyama is a Principal Big Data Architect on the AWS Glue team.

Big Data

Big Data Software Consulting Unstructured Data

It’s not your data. It’s how you use it. Unlock the power of data & build foundations of a data driven organisation

CIO Business Intelligence

MAY 24, 2022

However, this concept has evolved in line with the increasing demands of mature and sophisticated data-driven organisations, and with the increased use and sophistication of cloud computing services. store and process the data, typically in a data warehouse, where the data is modelled and schema applied. Oil and Gas.

Data-driven

Data-driven Data Lake Data Warehouse Cost-Benefit

A hybrid approach in healthcare data warehousing with Amazon Redshift

AWS Big Data

FEBRUARY 21, 2023

Loading complex multi-point datasets into a dimensional model, identifying issues, and validating data integrity of the aggregated and merged data points are the biggest challenges that clinical quality management systems face. This is one of the biggest hurdles with the data vault approach. What is a data vault?

Data Warehouse

Data Warehouse Data Lake Cost-Benefit Modeling

Extract data from SAP ERP using AWS Glue and the SAP SDK

AWS Big Data

FEBRUARY 8, 2023

Vyaire developed a custom data integration platform, iDataHub, powered by AWS services such as AWS Glue , AWS Lambda , and Amazon API Gateway. In this post, we share how we extracted data from SAP ERP using AWS Glue and the SAP SDK. Prahalathan M is the Data Integration Architect at Vyaire Medical Inc.

Testing

Testing Data Integration Data Lake Enterprise

Augmented data management: Data fabric versus data mesh

IBM Big Data Hub

APRIL 27, 2022

Data fabric and data mesh are emerging data management concepts that are meant to address the organizational change and complexities of understanding, governing and working with enterprise data in a hybrid multicloud ecosystem. The good news is that both data architecture concepts are complimentary.

Management

Management Metadata Data Architecture Data Lake

Five benefits of a data catalog

IBM Big Data Hub

DECEMBER 16, 2022

An enterprise data catalog does all that a library inventory system does – namely streamlining data discovery and access across data sources – and a lot more. For example, data catalogs have evolved to deliver governance capabilities like managing data quality and data privacy and compliance.

Metadata

Metadata Data Quality Data-driven Data Governance

Handle UPSERT data operations using open-source Delta Lake and AWS Glue

AWS Big Data

JANUARY 30, 2023

Many customers need an ACID transaction (atomic, consistent, isolated, durable) data lake that can log change data capture (CDC) from operational data sources. There is also demand for merging real-time data into batch data. Delta Lake framework provides these two capabilities. option("header",True).schema(schema).load("s3://"+

Insurance

Insurance Data Lake Data-driven Management

Addressing the Three Scalability Challenges in Modern Data Platforms

Cloudera

NOVEMBER 22, 2021

In legacy analytical systems such as enterprise data warehouses, the scalability challenges of a system were primarily associated with computational scalability, i.e., the ability of a data platform to handle larger volumes of data in an agile and cost-efficient way. Introduction. CRM platforms).

Data Processing

Data Processing Data Warehouse Enterprise Visualization

Breaking down Business Intelligence

BizAcuity

MAY 16, 2022

As a data analytics company, we have been observing a trend among certain large enterprises who are looking for real-time data streaming for analytics. So, make sure you have a data strategy in place. Data Integration. Data mining. When information is at your fingertips, the possibilities are endless.

Business Intelligence

Business Intelligence Data mining Visualization Data Lake

How data stores and governance impact your AI initiatives

IBM Big Data Hub

OCTOBER 12, 2023

They’re built on machine learning algorithms that create outputs based on an organization’s data or other third-party big data sources. Sometimes, these outputs are biased because the data used to train the model was incomplete or inaccurate in some way. Learn more about IBM watsonx 1.

Cost-Benefit

Cost-Benefit Metadata Data Governance Modeling

Nexthink scales to trillions of events per day with Amazon MSK

AWS Big Data

MARCH 29, 2024

Amazon MSK enables us to tailor the data retention duration to our specific requirements, ranging from seconds to unlimited duration. This flexibility grants uninterrupted data availability to our application, which wasn’t possible with our previous architecture.

Cost-Benefit

Cost-Benefit Data-driven Metrics Management

Data Management Challenges for the Modern Enterprise

Data Virtualization

MARCH 3, 2021

Data is the fuel of the digital economy, so data-centric organizations have a distinct advantage. To remain competitive, organizations must have a data management strategy in place to effectively ingest, store, organize, and analyze data while ensuring that it is.

Management

Management Enterprise Strategy IT

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

Migrate an existing data lake to a transactional data lake using Apache Iceberg

Webinars

Trending Sources

Five steps to jumpstart your data integration journey

Webinars

Migrate data from Azure Blob Storage to Amazon S3 using AWS Glue

The Enduring Significance of Data Modeling in the Modern Data-Driven Enterprise

Automatically detect Personally Identifiable Information in Amazon Redshift using AWS Glue

Data Management Requirements for the Enterprise Data Lake

Data governance in the age of generative AI

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics: Part 2

Introducing native support for Apache Hudi, Delta Lake, and Apache Iceberg on AWS Glue for Apache Spark, Part 2: AWS Glue Studio Visual Editor

What is Data Pipeline? A Detailed Explanation

Detect, mask, and redact PII data using AWS Glue before loading into Amazon OpenSearch Service

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

Compose your ETL jobs for MongoDB Atlas with AWS Glue

What is a Data Pipeline?

Simplifying data processing at Capitec with Amazon Redshift integration for Apache Spark

Your guide to AWS Analytics at AWS re:Invent 2023

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

The Ten Standard Tools To Develop Data Pipelines In Microsoft Azure

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

How Tricentis unlocks insights across the software development lifecycle at speed and scale using Amazon Redshift

Improve healthcare services through patient 360: A zero-ETL approach to enable near real-time data analytics

Prepare and load Amazon S3 data into Teradata using AWS Glue through its native connector for Teradata Vantage

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics, Part 3: Visualization and trend analysis using Amazon QuickSight

Unlocking the value of data as your differentiator

Connect your data for faster decisions with AWS

AWS Glue Data Quality is Generally Available

Create an end-to-end data strategy for Customer 360 on AWS

Data architecture strategy for data quality

TIBCO JasperSoft for BI and Reporting

Data Management Challenges for the Modern Enterprise

Migrate data from Google Cloud Storage to Amazon S3 using AWS Glue

It’s not your data. It’s how you use it. Unlock the power of data & build foundations of a data driven organisation

A hybrid approach in healthcare data warehousing with Amazon Redshift

Extract data from SAP ERP using AWS Glue and the SAP SDK

Augmented data management: Data fabric versus data mesh

Five benefits of a data catalog

Handle UPSERT data operations using open-source Delta Lake and AWS Glue

Addressing the Three Scalability Challenges in Modern Data Platforms

Breaking down Business Intelligence

How data stores and governance impact your AI initiatives

Nexthink scales to trillions of events per day with Amazon MSK

Data Management Challenges for the Modern Enterprise

Stay Connected