Big Data, Blog, Data Lake and Data Processing

Big Data

Blog

Data Lake

Data Processing

Migrate an existing data lake to a transactional data lake using Apache Iceberg

AWS Big Data

OCTOBER 3, 2023

A data lake is a centralized repository that you can use to store all your structured and unstructured data at any scale. You can store your data as-is, without having to first structure the data and then run different types of analytics for better business insights. Choose Next to create your stack.

Data Lake

Data Lake Metadata Snapshot Recreation/Entertainment

Create an Apache Hudi-based near-real-time transactional data lake using AWS DMS, Amazon Kinesis, AWS Glue streaming ETL, and data visualization using Amazon QuickSight

AWS Big Data

AUGUST 3, 2023

Data analytics on operational data at near-real time is becoming a common need. Due to the exponential growth of data volume, it has become common practice to replace read replicas with data lakes to have better scalability and performance. For more information, see Changing the default settings for your data lake.

Data Lake

Data Lake Visualization Dashboards Insurance

Join 52,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

How To Get Promoted In Product Management

MORE WEBINARS

Trending Sources

Analytics Vidhya

TDC Digital leverages IBM Cloud for transparent billing and improved customer satisfaction

IBM Big Data Hub

MAY 19, 2023

Furthermore, TDC Digital had not used any cloud storage solution and experienced latency and downtime while hosting the application in its data center. TDC Digital is excited about its plans to host its IT infrastructure in IBM data centers, offering better scalability, performance and security.

Unstructured Data

Unstructured Data Data Processing Manufacturing Data Lake

Webinars

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

How To Get Promoted In Product Management

MORE WEBINARS

Enhance query performance using AWS Glue Data Catalog column-level statistics

AWS Big Data

NOVEMBER 22, 2023

Data lakes are designed for storing vast amounts of raw, unstructured, or semi-structured data at a low cost, and organizations share those datasets across multiple departments and teams. The queries on these large datasets read vast amounts of data and can perform complex join operations on multiple datasets.

Statistics

Statistics Data Lake Optimization Data-driven

10 Things AWS Can Do for Your SaaS Company

Smart Data Collective

FEBRUARY 20, 2022

Data storage databases. Your SaaS company can store and protect any amount of data using Amazon Simple Storage Service (S3), which is ideal for data lakes, cloud-native applications, and mobile apps. This blog post has demonstrated how AWS can greatly benefit your SaaS company, on multiple levels. Easy to use.

Cost-Benefit

Cost-Benefit Data Lake Software Machine Learning

Set up advanced rules to validate quality of multiple datasets with AWS Glue Data Quality

AWS Big Data

JUNE 6, 2023

It supports both data quality at rest and data quality in AWS Glue extract, transform, and load (ETL) pipelines. Data quality at rest focuses on validating the data stored in data lakes, databases, or data warehouses. It ensures that the data meets specific quality standards before it is consumed.

Data Quality

Data Quality Data Lake Visualization Data-driven

Introducing the technology behind watsonx.ai, IBM’s AI and data platform for enterprise

IBM Big Data Hub

MAY 9, 2023

Over the past decade, deep learning arose from a seismic collision of data availability and sheer compute power, enabling a host of impressive AI capabilities. models are trained on IBM’s curated, enterprise-focused data lake, on our custom-designed cloud-native AI supercomputer, Vela. All watsonx.ai

Enterprise

Enterprise Technology Modeling Cost-Benefit

How Data Governance Protects Sensitive Data

erwin

APRIL 2, 2021

And knowing the business purpose translates into actively governing personal data against potential privacy and security violations. Do You Know Where Your Sensitive Data Is? Data is a valuable asset used to operate, manage and grow a business.

Data Governance

Data Governance Cost-Benefit Risk Metadata

5 misconceptions about cloud data warehouses

IBM Big Data Hub

FEBRUARY 2, 2023

Misconception 5: Cloud data warehouses reduce control over your deployment Some DBAs believe that cloud data warehouses lack the control and flexibility of on-prem data warehouses, making it harder to respond to security threats, performance issues or disasters.

Data Warehouse

Data Warehouse Cost-Benefit Unstructured Data Data Architecture

Quantitative and Qualitative Data: A Vital Combination

Sisense

OCTOBER 6, 2020

Digging into quantitative data Why is quantitative data important What are the problems with quantitative data Exploring qualitative data Qualitative data benefits Getting the most from qualitative data Better together. First, data isn’t created in a uniform, consistent format.

Statistics

Statistics Unstructured Data Data-driven Visualization

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. In this post, we describe Orca’s journey building a transactional data lake using Amazon Simple Storage Service (Amazon S3), Apache Iceberg, and AWS Analytics.

Data Lake

Data Lake Analytics Snapshot Optimization

How Cargotec uses metadata replication to enable cross-account data sharing

AWS Big Data

JUNE 7, 2023

This is a guest blog post co-written with Sumesh M R from Cargotec and Tero Karttunen from Knowit Finland. Cargotec captures terabytes of IoT telemetry data from their machinery operated by numerous customers across the globe. The source code for the application is hosted the AWS Glue GitHub.

Metadata

Metadata Data Lake Machine Learning Big Data

Announcing the 2020 Data Impact Award Winners

Cloudera

NOVEMBER 18, 2020

It hosts over 150 big data analytics sandboxes across the region with over 200 users utilizing the sandbox for data discovery. With this functionality, business units can now leverage big data analytics to develop better and faster insights to help achieve better revenues, higher productivity, and decrease risk. .

Internet Publishing and Broadcasting

Internet Publishing and Broadcasting Data-driven Broadcasting Digital Transformation

Design a data mesh on AWS that reflects the envisioned organization

AWS Big Data

JANUARY 22, 2024

Users can also raise requests to producers to improve the way the data is presented or to enrich the data with new data points for generating a higher business value. At the same time, each team can also map other catalogs to their own account and use their own data, which they produce along with the data from other accounts.

Data-driven

Data-driven Advertising Metadata Data Architecture

How The Cloud Made ‘Data-Driven Culture’ Possible | Part 1

BizAcuity

MAY 10, 2022

And how this transformation will impact businesses in the short and long run is the main discussion in this blog. 2007: Amazon launches SimpleDB, a non-relational (NoSQL) database that allows businesses to cheaply process vast amounts of data with minimal effort. The platform is built on S3 and EC2 using a hosted Hadoop framework.

Data-driven

Data-driven IoT Unstructured Data Data Lake

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

AWS Big Data

MARCH 3, 2023

Building data lakes from continuously changing transactional data of databases and keeping data lakes up to date is a complex task and can be an operational challenge. You can then apply transformations and store data in Delta format for managing inserts, updates, and deletes. For Type , choose Spark.

Data Lake

Data Lake Dashboards Metrics Metadata

Modern Data Architecture for Telecommunications

Cloudera

SEPTEMBER 6, 2022

Previously, there were three types of data structures in telco: . Entity data sets — i.e. marketing data lakes . The Cloudera Data Platform (CDP) enables modern data architectures with data anywhere at the telco scale. Application-based datasets — i.e. billing or contact center support systems .

Data Architecture

Data Architecture Cost-Benefit Digital Transformation Business Driver

Dancing with Elephants in 5 Easy Steps

Cloudera

AUGUST 21, 2020

And next to those legacy ERP, HCM, SCM and CRM systems, that mysterious elephant in the room – that “Big Data” platform running in the data center that is driving much of the company’s analytics and BI – looks like a great potential candidate. . Big Data is an ecosystem as well as a philosophy.

Cost-Benefit

Cost-Benefit Big Data ROI Risk

Improving Multi-tenancy with Virtual Private Clusters

Cloudera

JUNE 6, 2019

The typical Cloudera Enterprise Data Hub Cluster starts with a few dozen nodes in the customer’s datacenter hosting a variety of distributed services. Over time, workloads start processing more data, tenants start onboarding more workloads, and administrators (admins) start onboarding more tenants. Cloudera Manager (CM) 6.2

Metadata

Metadata Data Lake Optimization Strategy

Stitch Fix seamless migration: Transitioning from self-managed Kafka to Amazon MSK

AWS Big Data

SEPTEMBER 22, 2023

At Stitch Fix, we have been powered by data science since its foundation and rely on many modern data lake and data processing technologies. In our infrastructure, Apache Kafka has emerged as a powerful tool for managing event streams and facilitating real-time data processing.

Management

Management Metrics Cost-Benefit Data Lake

Ingest, transform, and deliver events published by Amazon Security Lake to Amazon OpenSearch Service

AWS Big Data

JUNE 19, 2023

Security Lake automatically centralizes security data from cloud, on-premises, and custom sources into a purpose-built data lake stored in your account. With Security Lake, you can get a more complete understanding of your security data across your entire organization. Choose Next.

Publishing

Publishing Dashboards Visualization Management

Periscope Data Expands to Israel, Empowering Data Teams with Powerful Tools

Sisense

DECEMBER 11, 2019

With data growing at a staggering rate, managing and structuring it is vital to your survival. In this piece, we detail the Israeli debut of Periscope Data. Driving startup growth with the power of data. It’s why Sisense, having merged with Periscope Data in May 2019, chose to host this event in Tel Aviv.

Data Lake

Data Lake Big Data Sales Data-driven

Exploring the AI and data capabilities of watsonx

IBM Big Data Hub

JULY 17, 2023

In this blog, I will cover: What is watsonx.ai? sales conversation summaries, insurance coverage, meeting transcripts, contract information) Generate: Generate text content for a specific purpose, such as marketing campaigns, job descriptions, blogs or articles, and email drafting support. What capabilities are included in watsonx.ai?

Machine Learning

Machine Learning Data Warehouse Modeling Cost-Benefit

The disruptive potential of open data lakehouse architectures and IBM watsonx.data

IBM Big Data Hub

JUNE 15, 2023

It is comprised of commodity cloud object storage, open data and open table formats, and high-performance open-source query engines. To help organizations scale AI workloads, we recently announced IBM watsonx.data , a data store built on an open data lakehouse architecture and part of the watsonx AI and data platform.

Data Warehouse

Data Warehouse Data Lake Optimization Data-driven

Addressing the Three Scalability Challenges in Modern Data Platforms

Cloudera

NOVEMBER 22, 2021

In addition, data pipelines include more and more stages, thus making it difficult for data engineers to compile, manage, and troubleshoot those analytical workloads. Those incremental costs derive from a variety of reasons: Increased data processing costs associated with legacy deployment types (e.g., CRM platforms).

Data Processing

Data Processing Data Warehouse Enterprise Visualization

Data Leaders Brief

Migrate an existing data lake to a transactional data lake using Apache Iceberg

Create an Apache Hudi-based near-real-time transactional data lake using AWS DMS, Amazon Kinesis, AWS Glue streaming ETL, and data visualization using Amazon QuickSight

Webinars

Trending Sources

TDC Digital leverages IBM Cloud for transparent billing and improved customer satisfaction

Webinars

Enhance query performance using AWS Glue Data Catalog column-level statistics

10 Things AWS Can Do for Your SaaS Company

Set up advanced rules to validate quality of multiple datasets with AWS Glue Data Quality

Introducing the technology behind watsonx.ai, IBM’s AI and data platform for enterprise

How Data Governance Protects Sensitive Data

5 misconceptions about cloud data warehouses

Quantitative and Qualitative Data: A Vital Combination

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

How Cargotec uses metadata replication to enable cross-account data sharing

Announcing the 2020 Data Impact Award Winners

Design a data mesh on AWS that reflects the envisioned organization

How The Cloud Made ‘Data-Driven Culture’ Possible | Part 1

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

Modern Data Architecture for Telecommunications

Dancing with Elephants in 5 Easy Steps

Improving Multi-tenancy with Virtual Private Clusters

Stitch Fix seamless migration: Transitioning from self-managed Kafka to Amazon MSK

Ingest, transform, and deliver events published by Amazon Security Lake to Amazon OpenSearch Service

Periscope Data Expands to Israel, Empowering Data Teams with Powerful Tools

Exploring the AI and data capabilities of watsonx

The disruptive potential of open data lakehouse architectures and IBM watsonx.data

Addressing the Three Scalability Challenges in Modern Data Platforms

Stay Connected