Analytics, Data Lake, Data Processing and IT

Analytics

Data Lake

Data Processing

Migrate an existing data lake to a transactional data lake using Apache Iceberg

AWS Big Data

OCTOBER 3, 2023

A data lake is a centralized repository that you can use to store all your structured and unstructured data at any scale. You can store your data as-is, without having to first structure the data and then run different types of analytics for better business insights.

Data Lake

Data Lake Metadata Snapshot Recreation/Entertainment

Important Considerations When Migrating to a Data Lake

Smart Data Collective

MARCH 30, 2022

Azure Data Lake Storage Gen2 is based on Azure Blob storage and offers a suite of big data analytics features. If you don’t understand the concept, you might want to check out our previous article on the difference between data lakes and data warehouses. Determine your preparedness. Conclusion.

Data Lake

Data Lake Cost-Benefit Data Warehouse Big Data

Join 52,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

How to Optimize the Developer Experience for Monumental Impact

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

MORE WEBINARS

Trending Sources

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

AWS Big Data

MARCH 10, 2023

Since the deluge of big data over a decade ago, many organizations have learned to build applications to process and analyze petabytes of data. Data lakes have served as a central repository to store structured and unstructured data at any scale and in various formats.

Data Lake

Data Lake Sales Data Warehouse Snapshot

Webinars

How to Optimize the Developer Experience for Monumental Impact

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

MORE WEBINARS

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

AWS Big Data

MARCH 7, 2024

At the same time, they need to optimize operational costs to unlock the value of this data for timely insights and do so with a consistent performance. With this massive data growth, data proliferation across your data stores, data warehouse, and data lakes can become equally challenging.

Data Lake

Data Lake Analytics Dashboards Metrics

Create an Apache Hudi-based near-real-time transactional data lake using AWS DMS, Amazon Kinesis, AWS Glue streaming ETL, and data visualization using Amazon QuickSight

AWS Big Data

AUGUST 3, 2023

With the rapid growth of technology, more and more data volume is coming in many different formats—structured, semi-structured, and unstructured. Data analytics on operational data at near-real time is becoming a common need. Then we can query the data with Amazon Athena visualize it in Amazon QuickSight.

Data Lake

Data Lake Visualization Dashboards Insurance

Secure cloud fabric: Enhancing data management and AI development for the federal government

CIO Business Intelligence

DECEMBER 19, 2023

To address these challenges, agencies are turning to a secure cloud fabric that can ensure the confidentiality, integrity, and availability of their data in the cloud. Current challenges with cloud data The US government generates and collects a massive amount of data each year – everything from census information to intelligence gathering.

Data Lake

Data Lake Management Cost-Benefit Data Processing

Query your Apache Hive metastore with AWS Lake Formation permissions

AWS Big Data

JULY 20, 2023

The Hive metastore is a repository of metadata about the SQL tables, such as database names, table names, schema, serialization and deserialization information, data location, and partition details of each table. Therefore, organizations have come to host huge volumes of metadata of their structured datasets in the Hive metastore.

Data Lake

Data Lake Metadata Data Processing Big Data

Configure cross-Region table access with the AWS Glue Catalog and AWS Lake Formation

AWS Big Data

AUGUST 3, 2023

Today’s modern data lakes span multiple accounts, AWS Regions, and lines of business in organizations. It’s important that their data solution gives them the ability to share and access data securely and safely across Regions. A resource link is a Data Catalog object that is a link to a database or table.

Data Lake

Data Lake Metadata Management Data Processing

Introducing AWS Glue crawler and create table support for Apache Iceberg format

AWS Big Data

AUGUST 16, 2023

Iceberg has become very popular for its support for ACID transactions in data lakes and features like schema and partition evolution, time travel, and rollback. AWS Glue crawlers will extract schema information and update the location of Iceberg metadata and schema updates in the Data Catalog.

Data Lake

Data Lake Metadata Snapshot Management

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

AWS Big Data

DECEMBER 13, 2023

For the past 5 years, BMS has used a custom framework called Enterprise Data Lake Services (EDLS) to create ETL jobs for business users. For the past 5 years, BMS has used a custom framework called Enterprise Data Lake Services (EDLS) to create ETL jobs for business users.

Metadata

Metadata Data Lake Visualization Data Transformation

Implement alerts in Amazon OpenSearch Service with PagerDuty

AWS Big Data

JUNE 8, 2023

This data is often stored and analyzed using various tools, such as Amazon OpenSearch Service , a powerful search and analytics service offered by AWS. OpenSearch Service provides real-time insights into your data to support use cases like interactive log analytics, real-time application monitoring, website search, and more.

Data Lake

Data Lake Dashboards Metrics Testing

Enhance query performance using AWS Glue Data Catalog column-level statistics

AWS Big Data

NOVEMBER 22, 2023

Data lakes are designed for storing vast amounts of raw, unstructured, or semi-structured data at a low cost, and organizations share those datasets across multiple departments and teams. The queries on these large datasets read vast amounts of data and can perform complex join operations on multiple datasets.

Statistics

Statistics Data Lake Optimization Data-driven

TDC Digital leverages IBM Cloud for transparent billing and improved customer satisfaction

IBM Big Data Hub

MAY 19, 2023

Furthermore, TDC Digital had not used any cloud storage solution and experienced latency and downtime while hosting the application in its data center. TDC Digital is excited about its plans to host its IT infrastructure in IBM data centers, offering better scalability, performance and security.

Unstructured Data

Unstructured Data Data Processing Manufacturing Data Lake

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics, Part 3: Visualization and trend analysis using Amazon QuickSight

AWS Big Data

MARCH 29, 2024

QuickSight makes it straightforward for business users to visualize data in interactive dashboards and reports. Typically, you have multiple accounts to manage and run resources for your data pipeline. Aggregating metrics and slicing data by different dimensions such as job name can provide deeper insights. Choose SHARE.

Metrics

Metrics Visualization Dashboards Interactive

Migrate Hive data from CDH to CDP public cloud

Cloudera

JUNE 25, 2021

Many Cloudera customers are making the transition from being completely on-prem to cloud by either backing up their data in the cloud, or running multi-functional analytics on CDP Public cloud in AWS or Azure. The Replication Manager service facilitates both disaster recovery and data migration across different environments.

Data Lake

Data Lake Metadata Unstructured Data Management

Access Amazon Athena in your applications using the WebSocket API

AWS Big Data

MARCH 2, 2023

Many organizations are building data lakes to store and analyze large volumes of structured, semi-structured, and unstructured data. In addition, many teams are moving towards a data mesh architecture, which requires them to expose their data sets as easily consumable data products.

Data Lake

Data Lake Testing Interactive Unstructured Data

Set up advanced rules to validate quality of multiple datasets with AWS Glue Data Quality

AWS Big Data

JUNE 6, 2023

AWS Glue Data Quality measures and monitors the quality of your dataset. It supports both data quality at rest and data quality in AWS Glue extract, transform, and load (ETL) pipelines. Data quality at rest focuses on validating the data stored in data lakes, databases, or data warehouses.

Data Quality

Data Quality Data Lake Visualization Data-driven

10 Things AWS Can Do for Your SaaS Company

Smart Data Collective

FEBRUARY 20, 2022

Whether it’s data management, analytics, or scalability, AWS can be the top-notch solution for any SaaS company. It has brought a lot of data to the cloud in recent years. Data storage databases. AWS instead offers services that help keep your data, accounts, and workloads secure from unauthorized access.

Cost-Benefit

Cost-Benefit Data Lake Software Machine Learning

Break data silos and stream your CDC data with Amazon Redshift streaming and Amazon MSK

AWS Big Data

DECEMBER 13, 2023

Traditionally, customers used batch-based approaches for data movement from operational systems to analytical systems. A batch-based approach can introduce latency in data movement and reduce the value of data for analytics. usually a data warehouse) needs to reflect those changes in near real-time.

Data Warehouse

Data Warehouse Snapshot Data Processing Management

Power enterprise-grade Data Vaults with Amazon Redshift – Part 2

AWS Big Data

NOVEMBER 16, 2023

Amazon Redshift is a popular cloud data warehouse, offering a fully managed cloud-based service that seamlessly integrates with an organization’s Amazon Simple Storage Service (Amazon S3) data lake, real-time streams, machine learning (ML) workflows, transactional workflows, and much more—all while providing up to 7.9x

Enterprise

Enterprise Data Warehouse Snapshot Cost-Benefit

Build a pseudonymization service on AWS to protect sensitive data: Part 2

AWS Big Data

MARCH 6, 2024

Consequently, an organization can achieve a standard process to handle sensitive data across all platforms. Additionally, this takes away any complexity and expertise needed to understand and implement various compliance requirements from development teams and analytical users, allowing them to focus on their business outcomes.

Metrics

Metrics Statistics Testing Data Lake

5 ways to maximize your cloud investment

CIO Business Intelligence

JANUARY 10, 2024

McMasters takes full advantage of the analytical tools Microsoft provides with Azure Cloud Computing Services to assist with cost optimization analysis, but he doesn’t stop there. “We Migrating infrastructure and applications to the cloud is never straightforward, and managing ongoing costs can be equally complicated.

Cost-Benefit

Cost-Benefit Measurement Optimization Metrics

5 misconceptions about cloud data warehouses

IBM Big Data Hub

FEBRUARY 2, 2023

In today’s world, data warehouses are a critical component of any organization’s technology ecosystem. They provide the backbone for a range of use cases such as business intelligence (BI) reporting, dashboarding, and machine-learning (ML)-based predictive analytics, that enable faster decision making and insights.

Data Warehouse

Data Warehouse Cost-Benefit Unstructured Data Data Architecture

How Data Governance Protects Sensitive Data

erwin

APRIL 2, 2021

An assessment of the data breaches that crop up like weeds each year supports the conclusion that companies, absent data governance, wind up building security architectures strictly from a technical perspective. So they can’t answer these fundamental questions: What data do we have and where is it now?

Data Governance

Data Governance Cost-Benefit Risk Metadata

Automate deployment of an Amazon QuickSight analysis connecting to an Amazon Redshift data warehouse with an AWS CloudFormation template

AWS Big Data

FEBRUARY 16, 2023

Amazon Redshift is the most widely used data warehouse in the cloud, best suited for analyzing exabytes of data and running complex analytical queries. Amazon QuickSight is a fast business analytics service to build visualizations, perform ad hoc analysis, and quickly get business insights from your data.

Data Warehouse

Data Warehouse Sales Visualization Data Processing

CDP Private Cloud is a Game-changer for Partners

Cloudera

SEPTEMBER 2, 2020

Additionally, lines of business (LOBs) are able to gain access to a shared data lake that is secured and governed by the use of Cloudera Shared Data Experience (SDX). According to 451 Research’s Voice of the Enterprise: Cloud, Hosting & Managed Services study, 58% of Enterprises are moving towards a hybrid IT environment.

Cost-Benefit

Cost-Benefit Data Warehouse Data Lake Machine Learning

Announcing the 2021 Data Impact Awards

Cloudera

MAY 12, 2021

2020 saw us hosting our first ever fully digital Data Impact Awards ceremony, and it certainly was one of the highlights of our year. We saw a record number of entries and incredible examples of how customers were using Cloudera’s platform and services to unlock the power of data. DATA FOR GOOD. INDUSTRY TRANSFORMATION.

Digital Transformation

Digital Transformation Machine Learning Optimization Data Lake

Unlocking Data Storage: The Traditional Data Warehouse vs. Cloud Data Warehouse

Sisense

NOVEMBER 12, 2020

Data warehouse vs. databases Traditional vs. Cloud Explained Cloud data warehouses in your data stack A data-driven future powered by the cloud. We live in a world of data: There’s more of it than ever before, in a ceaselessly expanding array of forms and locations. Traditional data warehouses.

Data Warehouse

Data Warehouse Data Lake OLAP Data-driven

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. In this post, we describe Orca’s journey building a transactional data lake using Amazon Simple Storage Service (Amazon S3), Apache Iceberg, and AWS Analytics.

Data Lake

Data Lake Analytics Snapshot Optimization

Quantitative and Qualitative Data: A Vital Combination

Sisense

OCTOBER 6, 2020

Digging into quantitative data Why is quantitative data important What are the problems with quantitative data Exploring qualitative data Qualitative data benefits Getting the most from qualitative data Better together. But are you paying attention to all of your data?

Statistics

Statistics Unstructured Data Data-driven Visualization

Introducing Amazon EMR on EKS job submission with Spark Operator and spark-submit

AWS Big Data

JUNE 6, 2023

Verify the job by running the following command: kubectl get pods -n data-team-a Enable access to the Spark UI The Spark UI is an important tool for data engineers because it allows you to track the progress of tasks, view detailed job and stage information, and analyze resource utilization to identify bottlenecks and optimize your code.

Optimization

Optimization Data Lake Cost-Benefit Management

Governing data in relational databases using Amazon DataZone

AWS Big Data

MAY 7, 2024

Data governance is a key enabler for teams adopting a data-driven culture and operational model to drive innovation with data. Amazon DataZone allows you to simply and securely govern end-to-end data assets stored in your Amazon Redshift data warehouses or data lakes cataloged with the AWS Glue data catalog.

Metadata

Metadata Data Lake Data Processing Data-driven

Join a streaming data source with CDC data for real-time serverless data analytics using AWS Glue, AWS DMS, and Amazon DynamoDB

AWS Big Data

MAY 30, 2023

Customers have been using data warehousing solutions to perform their traditional analytics tasks. Traditional batch ingestion and processing pipelines that involve operations such as data cleaning and joining with reference data are straightforward to create and cost-efficient to maintain.

Data Lake

Data Lake Data Analytics Analytics Data Processing

Accomplish Agile Business Intelligence & Analytics For Your Business

datapine

APRIL 15, 2020

When encouraging these BI best practices what we are really doing is advocating for agile business intelligence and analytics. When encouraging these BI best practices what we are really doing is advocating for agile business intelligence and analytics. What Is Agile Analytics And BI? Without further ado, let’s begin.

Business Intelligence

Business Intelligence Analytics Testing Dashboards

Accelerate your data warehouse migration to Amazon Redshift – Part 7

AWS Big Data

OCTOBER 17, 2023

With Amazon Redshift, you can use standard SQL to query data across your data warehouse, operational data stores, and data lake. Migrating a data warehouse can be complex. You have to migrate terabytes or petabytes of data from your legacy system while not disrupting your production workload.

Data Warehouse

Data Warehouse Data Processing Data Lake Management

Get Your Analytics Insights Instantly – Without Abandoning Central IT

Cloudera

JANUARY 21, 2021

While cloud-native, point-solution data warehouse services may serve your immediate business needs, there are dangers to the corporation as a whole when you do your own IT this way. And you also already know siloed data is costly, as that means it will be much tougher to derive novel insights from all of your data by joining data sets.

Data Lake

Data Lake Data Warehouse IT Analytics

Use your corporate identities for analytics with Amazon EMR and AWS IAM Identity Center

AWS Big Data

APRIL 26, 2024

To enable your workforce users for analytics with fine-grained data access controls and audit data access, you might have to create multiple AWS Identity and Access Management (IAM) roles with different data permissions and map the workforce users to one of those roles. You will not see any applications on this page yet.

Analytics

Analytics Data Lake Management Enterprise

Eight Top DataOps Trends for 2022

DataKitchen

NOVEMBER 29, 2021

Many in the data industry recognize the serious impact of AI bias and seek to take active steps to mitigate it. The data industry realizes that AI bias is simply a quality problem, and AI systems should be subject to this same level of process control as an automobile rolling off an assembly line. Data Gets Meshier.

Testing

Testing Data Lake Data Architecture Manufacturing

UAB IT helps fuel genomic breakthroughs

CIO Business Intelligence

MARCH 10, 2022

When Dr. Curtis Carver, Ph.D., took over as CIO at University of Alabama-Birmingham in 2015, he confronted a “computer science museum,” as he calls it — instances of every operating system, storage device, and application on the market for the past 30 years. It is the speed of light that becomes a limiting factor.”.

IT Data Lake Digital Transformation Data Governance

Extreme data center pressure? Burst to the cloud with CDP!

Cloudera

NOVEMBER 12, 2020

Cloud has given us hope, with public clouds at our disposal we now have virtually infinite resources, but they come at a different cost – using the cloud means we may be creating yet another series of silos, which also creates unmeasurable new risks in security and traceability of our data. We see it everywhere.

Data Warehouse

Data Warehouse Reporting Risk Cost-Benefit

Accelerating revenue growth with real-time analytics: Poshmark’s journey

AWS Big Data

MARCH 20, 2023

In this post, we share how Poshmark improved CX and accelerated revenue growth by using a real-time analytics solution. High-level challenge: The need for real-time analytics Previous efforts at Poshmark for improving CX through analytics were based on batch processing of analytics data and using it on a daily basis to improve CX.

Analytics

Analytics Slice and Dice Data Processing Data Lake

Themes and Conferences per Pacoid, Episode 8

Domino Data Lab

APRIL 3, 2019

This month’s article features updates from one of the early data conferences of the year, Strata Data Conference – which was held just last week in San Francisco. In particular, here’s my Strata SF talk “Overview of Data Governance” presented in article form. for DG adoption in the enterprise. a second priority?at

Data Governance

Data Governance Machine Learning Metadata Big Data

Preparing the foundations for Generative AI

CIO Business Intelligence

FEBRUARY 20, 2024

Generative AI tools are only as accurate and effective as the data they can access, and while government and public service agencies will have invested in their data infrastructure, it still may not be optimised for the demands of generative AI. This seems a formidable task. Microsoft Fabric could be one way to deal with it.

Cost-Benefit

Cost-Benefit Data Lake Data Warehouse Data Integration

Habib Bank manages data at scale with Cloudera Data Platform

Cloudera

NOVEMBER 17, 2022

We needed a solution to manage our data at scale, to provide greater experiences to our customers. The Solution: CDP Private Cloud brings a next-generation hybrid architecture with cloud-native benefits to HBL’s data platform. HBL was the first Pakistani commercial bank to be established in Pakistan in 1947.

Management

Management Data Lake Consulting Unstructured Data

Migrate an existing data lake to a transactional data lake using Apache Iceberg

Important Considerations When Migrating to a Data Lake

Webinars

Trending Sources

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

Webinars

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

Create an Apache Hudi-based near-real-time transactional data lake using AWS DMS, Amazon Kinesis, AWS Glue streaming ETL, and data visualization using Amazon QuickSight

Secure cloud fabric: Enhancing data management and AI development for the federal government

Query your Apache Hive metastore with AWS Lake Formation permissions

Configure cross-Region table access with the AWS Glue Catalog and AWS Lake Formation

Introducing AWS Glue crawler and create table support for Apache Iceberg format

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

Implement alerts in Amazon OpenSearch Service with PagerDuty

Enhance query performance using AWS Glue Data Catalog column-level statistics

TDC Digital leverages IBM Cloud for transparent billing and improved customer satisfaction

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics, Part 3: Visualization and trend analysis using Amazon QuickSight

Migrate Hive data from CDH to CDP public cloud

Access Amazon Athena in your applications using the WebSocket API

Set up advanced rules to validate quality of multiple datasets with AWS Glue Data Quality

10 Things AWS Can Do for Your SaaS Company

Break data silos and stream your CDC data with Amazon Redshift streaming and Amazon MSK

Power enterprise-grade Data Vaults with Amazon Redshift – Part 2

Build a pseudonymization service on AWS to protect sensitive data: Part 2

5 ways to maximize your cloud investment

5 misconceptions about cloud data warehouses

How Data Governance Protects Sensitive Data

Automate deployment of an Amazon QuickSight analysis connecting to an Amazon Redshift data warehouse with an AWS CloudFormation template

CDP Private Cloud is a Game-changer for Partners

Announcing the 2021 Data Impact Awards

Unlocking Data Storage: The Traditional Data Warehouse vs. Cloud Data Warehouse

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Quantitative and Qualitative Data: A Vital Combination

Introducing Amazon EMR on EKS job submission with Spark Operator and spark-submit

Governing data in relational databases using Amazon DataZone

Join a streaming data source with CDC data for real-time serverless data analytics using AWS Glue, AWS DMS, and Amazon DynamoDB

Accomplish Agile Business Intelligence & Analytics For Your Business

Accelerate your data warehouse migration to Amazon Redshift – Part 7

Get Your Analytics Insights Instantly – Without Abandoning Central IT

Use your corporate identities for analytics with Amazon EMR and AWS IAM Identity Center

Eight Top DataOps Trends for 2022

UAB IT helps fuel genomic breakthroughs

Extreme data center pressure? Burst to the cloud with CDP!

Accelerating revenue growth with real-time analytics: Poshmark’s journey

Themes and Conferences per Pacoid, Episode 8

Preparing the foundations for Generative AI

Habib Bank manages data at scale with Cloudera Data Platform

Stay Connected