Data Lake, Data Processing and Technology

Data Lake

Data Processing

Technology

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. In this post, we describe Orca’s journey building a transactional data lake using Amazon Simple Storage Service (Amazon S3), Apache Iceberg, and AWS Analytics.

Data Lake

Data Lake Analytics Snapshot Optimization

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

AWS Big Data

MARCH 10, 2023

Since the deluge of big data over a decade ago, many organizations have learned to build applications to process and analyze petabytes of data. Data lakes have served as a central repository to store structured and unstructured data at any scale and in various formats.

Data Lake

Data Lake Sales Data Warehouse Snapshot

Join 52,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

How to Optimize the Developer Experience for Monumental Impact

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

MORE WEBINARS

Trending Sources

Secure cloud fabric: Enhancing data management and AI development for the federal government

CIO Business Intelligence

DECEMBER 19, 2023

In recent years, government agencies have increasingly turned to cloud computing to manage vast amounts of data and streamline operations. While cloud technology has many benefits, it also poses security risks, especially when it comes to protecting sensitive information.

Data Lake

Data Lake Management Cost-Benefit Data Processing

Webinars

How to Optimize the Developer Experience for Monumental Impact

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

MORE WEBINARS

Create an Apache Hudi-based near-real-time transactional data lake using AWS DMS, Amazon Kinesis, AWS Glue streaming ETL, and data visualization using Amazon QuickSight

AWS Big Data

AUGUST 3, 2023

With the rapid growth of technology, more and more data volume is coming in many different formats—structured, semi-structured, and unstructured. Data analytics on operational data at near-real time is becoming a common need. Then we can query the data with Amazon Athena visualize it in Amazon QuickSight.

Data Lake

Data Lake Visualization Dashboards Insurance

Introducing the technology behind watsonx.ai, IBM’s AI and data platform for enterprise

IBM Big Data Hub

MAY 9, 2023

Over the past decade, deep learning arose from a seismic collision of data availability and sheer compute power, enabling a host of impressive AI capabilities. But these powerful technologies also introduce new risks and challenges for enterprises. We stand on the frontier of an AI revolution. All watsonx.ai

Enterprise

Enterprise Technology Modeling Cost-Benefit

Build a data lake with Apache Flink on Amazon EMR

AWS Big Data

JANUARY 27, 2023

Verify all table metadata is stored in the AWS Glue Data Catalog. Consume data with Athena or Amazon EMR Trino for business analysis. Update and delete source records in Amazon RDS for MySQL and validate the reflection of the data lake tables. the Flink table API/SQL can integrate with the AWS Glue Data Catalog.

Data Lake

Data Lake Metadata Business Analysis Data-driven

DS Smith sets a single-cloud agenda for sustainability

CIO Business Intelligence

DECEMBER 6, 2023

Much of our digital agenda is around data. Before we were quite fragmented across different technologies. Its digital transformation began with an application modernization phase, in which Dickson and her IT teams determined which applications should be hosted in the public cloud and which should remain on a private cloud.

Manufacturing

Manufacturing Data Lake Digital Transformation Machine Learning

High Availability (Multi-AZ) for Cloudera Operational Database

Cloudera

FEBRUARY 13, 2024

We will not repeat ourselves, so it’s assumed that technologies and concepts like HA, Multi-AZ, and operational databases are already known to the reader through the previous blog post. Below is the Azure CLI command: Cloudera allows FreeIPA servers, enterprise data lake, and data hub to be configured as Multi-AZ deployment.

Data Lake

Data Lake Testing Data Processing Enterprise

Query your Apache Hive metastore with AWS Lake Formation permissions

AWS Big Data

JULY 20, 2023

The Hive metastore is a repository of metadata about the SQL tables, such as database names, table names, schema, serialization and deserialization information, data location, and partition details of each table. Therefore, organizations have come to host huge volumes of metadata of their structured datasets in the Hive metastore.

Data Lake

Data Lake Metadata Data Processing Big Data

Eight Top DataOps Trends for 2022

DataKitchen

NOVEMBER 29, 2021

DataOps adoption continues to expand as a perfect storm of social, economic, and technological factors drive enterprises to invest in process-driven innovation. As a result, enterprises will examine their end-to-end data operations and analytics creation workflows. The Hub-Spoke architecture is part of a data enablement trend in IT.

Testing

Testing Data Lake Data Architecture Manufacturing

Empowering data-driven excellence: How the Bluestone Data Platform embraced data mesh for success

AWS Big Data

FEBRUARY 27, 2024

Each data producer within the organization has its own data lake in Apache Hudi format, ensuring data sovereignty and autonomy. This enables data-driven decision-making across the organization. Ben Vengerovsky is a Data Platform Product Manager at Bluestone.

Data-driven

Data-driven Data Lake Data Quality Data Governance

CIOs weigh where to place AI bets — and how to de-risk them

CIO Business Intelligence

MARCH 18, 2024

Amid the turbulence of AI, technologies are emerging rapidly, startups are clamoring for attention, and hyperscalers are scrambling to corral market share. Brian Hopkins, vice president for emerging technology at Forrester Research, agrees. It’s an environment that taxes the decision-making skills of the even the most savvy CIOs.

Risk

Risk Cost-Benefit Data Processing Testing

Configure cross-Region table access with the AWS Glue Catalog and AWS Lake Formation

AWS Big Data

AUGUST 3, 2023

Today’s modern data lakes span multiple accounts, AWS Regions, and lines of business in organizations. It’s important that their data solution gives them the ability to share and access data securely and safely across Regions. For example, we are using a data lake administrator role called LF-Admin.

Data Lake

Data Lake Metadata Management Data Processing

Dairyland powers up for a generative AI edge

CIO Business Intelligence

APRIL 9, 2024

Now, thanks to the cooperative’s tight partnership with Microsoft systems integrator Stoneridge Software, as well as Melby’s extensive technology experience, Dairyland — which was formed during the New Deal in the 1930s — has been able to experiment with and put into production some of the earliest Microsoft Azure-based LLMs, Melby says.

Digital Transformation

Digital Transformation Machine Learning Data Lake Software

Habib Bank manages data at scale with Cloudera Data Platform

Cloudera

NOVEMBER 17, 2022

HBL has re-envisioned itself as a ‘Technology company with a banking license’, as it transforms into the bank of tomorrow – one which empowers its customers through digital enablement. The Solution: CDP Private Cloud brings a next-generation hybrid architecture with cloud-native benefits to HBL’s data platform.

Management

Management Data Lake Consulting Unstructured Data

What a quarter century of digital transformation at PayPal looks like

CIO Business Intelligence

OCTOBER 4, 2023

Since 1998, the brand has evolved and grown in step with technology, and today, the size of its network and consumer use has made it a household name in digital payment systems. Initially, the company emerged from x.com and Confinity as a crypto company, developing P2P payments and using PalmPilot’s Beam technology. trillion last year.

Digital Transformation

Digital Transformation Deep Learning Data Lake Risk

Your New Cloud for AI May Be Inside a Colo

CIO Business Intelligence

MAY 23, 2022

Many companies whose AI model training infrastructure is not proximal to their data lake incur steeper costs as the data sets grow larger and AI models become more complex. Companies such as Cyxtera, Digital Realty and Equinix, among others, offer hosting, managing and operations services for AI infrastructure.

Experimentation

Experimentation Cost-Benefit Data Lake Data Science

Introducing AWS Glue crawler and create table support for Apache Iceberg format

AWS Big Data

AUGUST 16, 2023

Iceberg has become very popular for its support for ACID transactions in data lakes and features like schema and partition evolution, time travel, and rollback. Solution overview For our example use case, a customer uses Amazon EMR for data processing and Iceberg format for the transactional data. Choose Create.

Data Lake

Data Lake Metadata Snapshot Management

The essential check list for effective data democratization

CIO Business Intelligence

JANUARY 20, 2023

Exercising tactful platform selection In many cases, only IT has access to data and data intelligence tools in organizations that don’t practice data democratization. So in order to make data accessible to all, new tools and technologies are required. Most organizations don’t end up with data lakes, says Orlandini.

Data Lake

Data Lake Data-driven Finance Data Architecture

AWS Glue crawlers support cross-account crawling to support data mesh architecture

AWS Big Data

MARCH 27, 2023

Data lakes have come a long way, and there’s been tremendous innovation in this space. Today’s modern data lakes are cloud native, work with multiple data types, and make this data easily available to diverse stakeholders across the business. In the navigation pane, under Data catalog , choose Settings.

Data Lake

Data Lake Data-driven Management Data Architecture

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

AWS Big Data

DECEMBER 13, 2023

For the past 5 years, BMS has used a custom framework called Enterprise Data Lake Services (EDLS) to create ETL jobs for business users. BMS’s EDLS platform hosts over 5,000 jobs and is growing at 15% YoY (year over year). About the authors Sivaprasad Mahamkali is a Senior Streaming Data Engineer at AWS Professional Services.

Metadata

Metadata Data Lake Visualization Data Transformation

TDC Digital leverages IBM Cloud for transparent billing and improved customer satisfaction

IBM Big Data Hub

MAY 19, 2023

Furthermore, TDC Digital had not used any cloud storage solution and experienced latency and downtime while hosting the application in its data center. TDC Digital is excited about its plans to host its IT infrastructure in IBM data centers, offering better scalability, performance and security.

Unstructured Data

Unstructured Data Data Processing Manufacturing Data Lake

Join a streaming data source with CDC data for real-time serverless data analytics using AWS Glue, AWS DMS, and Amazon DynamoDB

AWS Big Data

MAY 30, 2023

Customers have been using data warehousing solutions to perform their traditional analytics tasks. Recently, data lakes have gained lot of traction to become the foundation for analytical solutions, because they come with benefits such as scalability, fault tolerance, and support for structured, semi-structured, and unstructured datasets.

Data Lake

Data Lake Data Analytics Analytics Data Processing

Top 15 data management platforms available today

CIO Business Intelligence

SEPTEMBER 22, 2023

All this data arrives by the terabyte, and a data management platform can help marketers make sense of it all. DMPs excel at negotiating with a wide array of databases, data lakes, or data warehouses, ingesting their streams of data and then cleaning, sorting, and unifying the information therein.

Management

Management Advertising Data Lake Sales

Top 15 data management platforms

CIO Business Intelligence

JUNE 9, 2022

All this data arrives by the terabyte, and a data management platform can help marketers make sense of it all. Marketing-focused or not, DMPs excel at negotiating with a wide array of databases, data lakes, or data warehouses, ingesting their streams of data and then cleaning, sorting, and unifying the information therein.

Management

Management Advertising Data Lake Sales

UAB IT helps fuel genomic breakthroughs

CIO Business Intelligence

MARCH 10, 2022

Carver’s first priority then was to clean up the outdated mess and rationalize the entire technology infrastructure onto a single, comprehensive platform that would empower the university’s thousands of scientists to innovate and make breakthrough research discoveries. “We Next up: AI and data lake decisions.

IT Data Lake Digital Transformation Data Governance

10 Things AWS Can Do for Your SaaS Company

Smart Data Collective

FEBRUARY 20, 2022

Data storage databases. Your SaaS company can store and protect any amount of data using Amazon Simple Storage Service (S3), which is ideal for data lakes, cloud-native applications, and mobile apps. With the advancement of technology and more people accessing the internet, data security has become increasingly important.

Cost-Benefit

Cost-Benefit Data Lake Software Machine Learning

How Agencies Can Gain the Cyber Edge with Smart Data Solutions

Cloudera

DECEMBER 13, 2022

The attack targeted a host of public and private sector organizations (18,000 customers) including NASA, the Justice Department, and Homeland Security, and it is believed the attackers persisted on SolarWinds systems for 14 months prior to discovery. All with the integrated security and governance technologies required for compliance.

Machine Learning

Machine Learning Experimentation Data Lake Data Processing

10 Keys to a Secure Cloud Data Lakehouse

Cloudera

OCTOBER 25, 2022

The data lakehouse is gaining in popularity because it enables a single platform for all your enterprise data with the flexibility to run any analytic and machine learning (ML) use case. Cloud data lakehouses provide significant scaling, agility, and cost advantages compared to cloud data lakes and cloud data warehouses.

Data Processing

Data Processing Data Lake Cost-Benefit Risk

FINRA CIO Steve Randich pushes the public cloud forward

CIO Business Intelligence

FEBRUARY 10, 2023

While managing unstructured data remains a challenge for 36% of organizations, according to the 2022 Foundry Data and Analytics Research survey, many IT leaders are actively seeking ways of harnessing all types of data stored in data lakes.

Unstructured Data

Unstructured Data Data Lake Machine Learning Enterprise

CIOs rise to the ESG reporting challenge

CIO Business Intelligence

JANUARY 30, 2024

“Always the gatekeepers of much of the data necessary for ESG reporting, CIOs are finding that companies are even more dependent on them,” says Nancy Mentesana, ESG executive director at Labrador US, a global communications firm focused on corporate disclosure documents. What companies need more than anything is good data for ESG reporting.

Reporting

Reporting Data Quality Strategy Data-driven

Addressing the Three Scalability Challenges in Modern Data Platforms

Cloudera

NOVEMBER 22, 2021

Open source frameworks such as Apache Impala, Apache Hive and Apache Spark offer a highly scalable programming model that is capable of processing massive volumes of structured and unstructured data by means of parallel execution on a large number of commodity computing nodes. . As a result, alternative data integration technologies (e.g.,

Data Processing

Data Processing Data Warehouse Enterprise Visualization

Real-time streaming data top picks you cannot miss at AWS re:Invent 2023

AWS Big Data

NOVEMBER 8, 2023

Putting your data to work with generative AI – Innovation Talk Thursday, November 30 | 12:30 – 1:30 PM PST | The Venetian Join Mai-Lan Tomsen Bukovec, Vice President, Technology at AWS to learn how you can turn your data lake into a business advantage with generative AI. Reserve your seat now! Reserve your seat now!

Data-driven

Data-driven Data Lake Machine Learning Cost-Benefit

Announcing the 2020 Data Impact Award Winners

Cloudera

NOVEMBER 18, 2020

The technological linchpin of its digital transformation has been its Enterprise Data Architecture & Governance platform. It hosts over 150 big data analytics sandboxes across the region with over 200 users utilizing the sandbox for data discovery. Data for Good. Winner: Rush University Medical Center.

Internet Publishing and Broadcasting

Internet Publishing and Broadcasting Data-driven Broadcasting Digital Transformation

5 ways to maximize your cloud investment

CIO Business Intelligence

JANUARY 10, 2024

Be strategic with reserve pricing Reserve pricing for cloud services can reduce resource usage costs by as much as 70%, says David McKee, who in his role as fractional CTO, tech founder, and digital twins thought leader at Counterpoint Technologies acts as a part-time CTO for nine companies in the US and Europe. That helps me plan.”

Cost-Benefit

Cost-Benefit Measurement Optimization Metrics

Capital Group invests big in talent development

CIO Business Intelligence

JULY 29, 2022

For example, if a data team member wants to increase their skills or move to a data engineer position, they can embark on a curriculum for up to two years to gain the right skills and experience. The bootcamp broadened my understanding of key concepts in data engineering.

Data Lake

Data Lake Software Data Processing Structured Data

Use your corporate identities for analytics with Amazon EMR and AWS IAM Identity Center

AWS Big Data

APRIL 26, 2024

Set up EMR Studio In this step, we demonstrate the actions needed from the data lake administrator to set up EMR Studio enabled for trusted identity propagation and with IAM Identity Center integration. On the Lake Formation console, choose Data lake permissions under Permissions in the navigation pane.

Analytics

Analytics Data Lake Management Enterprise

Build efficient ETL pipelines with AWS Step Functions distributed map and redrive feature

AWS Big Data

DECEMBER 18, 2023

AWS Step Functions is a fully managed visual workflow service that enables you to build complex data processing pipelines involving a diverse set of extract, transform, and load (ETL) technologies such as AWS Glue , Amazon EMR , and Amazon Redshift. There are multiple tables related to customers and order data in the RDS database.

Metadata

Metadata Visualization Data Lake Data-driven

Modern Data Architecture for Telecommunications

Cloudera

SEPTEMBER 6, 2022

Previously, there were three types of data structures in telco: . Entity data sets — i.e. marketing data lakes . The Cloudera Data Platform (CDP) enables modern data architectures with data anywhere at the telco scale. Application-based datasets — i.e. billing or contact center support systems .

Data Architecture

Data Architecture Cost-Benefit Digital Transformation Business Driver

Power enterprise-grade Data Vaults with Amazon Redshift – Part 2

AWS Big Data

NOVEMBER 16, 2023

Amazon Redshift is a popular cloud data warehouse, offering a fully managed cloud-based service that seamlessly integrates with an organization’s Amazon Simple Storage Service (Amazon S3) data lake, real-time streams, machine learning (ML) workflows, transactional workflows, and much more—all while providing up to 7.9x

Enterprise

Enterprise Data Warehouse Snapshot Cost-Benefit

Build a pseudonymization service on AWS to protect sensitive data: Part 2

AWS Big Data

MARCH 6, 2024

Following a decoupled service-based approach means that, as an organization, you are unbiased towards the use of any specific technologies to solve your business problems. No matter which technology is preferred by individual teams, they are able to call the pseudonymization service to pseudonymize sensitive data.

Metrics

Metrics Statistics Testing Data Lake

How Data Governance Protects Sensitive Data

erwin

APRIL 2, 2021

And knowing the business purpose translates into actively governing personal data against potential privacy and security violations. Do You Know Where Your Sensitive Data Is? Data is a valuable asset used to operate, manage and grow a business. Minimizing Risk Exposure with Data Intelligence.

Data Governance

Data Governance Cost-Benefit Risk Metadata

Unlocking Data Storage: The Traditional Data Warehouse vs. Cloud Data Warehouse

Sisense

NOVEMBER 12, 2020

While the organization of these layers has been refined over the years, the interoperability of the technologies, the myriad software, and orchestration of the systems make the management of these systems a challenge. Cloud data warehouses. with a cloud data warehouse is simple.

Data Warehouse

Data Warehouse Data Lake OLAP Data-driven

Running both IT and digital at Alorica

CIO Business Intelligence

JUNE 1, 2022

Finally, make sure you understand your data, because no machine learning solution will work for you if you aren’t working with the right data. Data lakes have a new consumer in AI. Many of our service-based offerings include hosting and executing our customers’ omnichannel platforms.

IT Interactive Marketing Consulting

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

Webinars

Trending Sources

Secure cloud fabric: Enhancing data management and AI development for the federal government

Webinars

Create an Apache Hudi-based near-real-time transactional data lake using AWS DMS, Amazon Kinesis, AWS Glue streaming ETL, and data visualization using Amazon QuickSight

Introducing the technology behind watsonx.ai, IBM’s AI and data platform for enterprise

Build a data lake with Apache Flink on Amazon EMR

DS Smith sets a single-cloud agenda for sustainability

High Availability (Multi-AZ) for Cloudera Operational Database

Query your Apache Hive metastore with AWS Lake Formation permissions

Eight Top DataOps Trends for 2022

Empowering data-driven excellence: How the Bluestone Data Platform embraced data mesh for success

CIOs weigh where to place AI bets — and how to de-risk them

Configure cross-Region table access with the AWS Glue Catalog and AWS Lake Formation

Dairyland powers up for a generative AI edge

Habib Bank manages data at scale with Cloudera Data Platform

What a quarter century of digital transformation at PayPal looks like

Your New Cloud for AI May Be Inside a Colo

Introducing AWS Glue crawler and create table support for Apache Iceberg format

The essential check list for effective data democratization

AWS Glue crawlers support cross-account crawling to support data mesh architecture

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

TDC Digital leverages IBM Cloud for transparent billing and improved customer satisfaction

Join a streaming data source with CDC data for real-time serverless data analytics using AWS Glue, AWS DMS, and Amazon DynamoDB

Top 15 data management platforms available today

Top 15 data management platforms

UAB IT helps fuel genomic breakthroughs

10 Things AWS Can Do for Your SaaS Company

How Agencies Can Gain the Cyber Edge with Smart Data Solutions

10 Keys to a Secure Cloud Data Lakehouse

FINRA CIO Steve Randich pushes the public cloud forward

CIOs rise to the ESG reporting challenge

Addressing the Three Scalability Challenges in Modern Data Platforms

Real-time streaming data top picks you cannot miss at AWS re:Invent 2023

Announcing the 2020 Data Impact Award Winners

5 ways to maximize your cloud investment

Capital Group invests big in talent development

Use your corporate identities for analytics with Amazon EMR and AWS IAM Identity Center

Build efficient ETL pipelines with AWS Step Functions distributed map and redrive feature

Modern Data Architecture for Telecommunications

Power enterprise-grade Data Vaults with Amazon Redshift – Part 2

Build a pseudonymization service on AWS to protect sensitive data: Part 2

How Data Governance Protects Sensitive Data

Unlocking Data Storage: The Traditional Data Warehouse vs. Cloud Data Warehouse

Running both IT and digital at Alorica

Stay Connected