2023, Data Lake and Optimization

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

AWS Big Data

MARCH 10, 2023

Since the deluge of big data over a decade ago, many organizations have learned to build applications to process and analyze petabytes of data. Data lakes have served as a central repository to store structured and unstructured data at any scale and in various formats.

Data Lake

Data Lake Sales Data Warehouse Snapshot

Apache Iceberg optimization: Solving the small files problem in Amazon EMR

AWS Big Data

OCTOBER 3, 2023

In our previous post Improve operational efficiencies of Apache Iceberg tables built on Amazon S3 data lakes , we discussed how you can implement solutions to improve operational efficiencies of your Amazon Simple Storage Service (Amazon S3) data lake that is using the Apache Iceberg open table format and running on the Amazon EMR big data platform.

Optimization

Optimization Snapshot Data Lake Metadata

Use Apache Iceberg in a data lake to support incremental data processing

AWS Big Data

MARCH 2, 2023

Iceberg has become very popular for its support for ACID transactions in data lakes and features like schema and partition evolution, time travel, and rollback. and later supports the Apache Iceberg framework for data lakes. AWS Glue 3.0 The following diagram illustrates the solution architecture.

Data Lake

Data Lake Data Processing Metadata Snapshot

Webinars

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Manufacturing Sustainability Surge: Your Guide to Data-Driven Energy Optimization & Decarbonization

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

MORE WEBINARS

Optimize data layout by bucketing with Amazon Athena and AWS Glue to accelerate downstream queries

AWS Big Data

APRIL 25, 2024

In the era of data, organizations are increasingly using data lakes to store and analyze vast amounts of structured and unstructured data. Data lakes provide a centralized repository for data from various sources, enabling organizations to unlock valuable insights and drive data-driven decision-making.

Optimization

Optimization Data Lake Cost-Benefit Reporting

Generative AI: 5 enterprise predictions for AI and security — for 2023, 2024, and beyond

CIO Business Intelligence

OCTOBER 25, 2023

Enterprise use of AI tools will only grow, with industries like manufacturing leading the charge Our research shows that mirroring the broader AI trend, enterprises across industry verticals sharply increased their use of AI from May 2023 to June 2023, with sustained growth through August 2023.

Enterprise

Enterprise Risk Manufacturing Finance

Build a transactional data lake using Apache Iceberg, AWS Glue, and cross-account data shares using AWS Lake Formation and Amazon Athena

AWS Big Data

APRIL 24, 2023

Building a data lake on Amazon Simple Storage Service (Amazon S3) provides numerous benefits for an organization. However, many use cases, like performing change data capture (CDC) from an upstream relational database to an Amazon S3-based data lake, require handling data at a record level.

Data Lake

Data Lake Data Governance Cost-Benefit Machine Learning

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

AWS Big Data

MARCH 7, 2024

At the same time, they need to optimize operational costs to unlock the value of this data for timely insights and do so with a consistent performance. With this massive data growth, data proliferation across your data stores, data warehouse, and data lakes can become equally challenging.

Data Lake

Data Lake Analytics Dashboards Metrics

Use Amazon Athena with Spark SQL for your open-source transactional table formats

AWS Big Data

JANUARY 24, 2024

AWS-powered data lakes, supported by the unmatched availability of Amazon Simple Storage Service (Amazon S3), can handle the scale, agility, and flexibility required to combine different data and analytics approaches. The output will give a count of the number of data and metadata files deleted.

Snapshot

Snapshot Data Lake Metadata Optimization

Simplifying data processing at Capitec with Amazon Redshift integration for Apache Spark

AWS Big Data

NOVEMBER 10, 2023

Your applications can seamlessly read from and write to your Amazon Redshift data warehouse while maintaining optimal performance and transactional consistency. Additionally, you’ll benefit from performance improvements through pushdown optimizations, further enhancing the efficiency of your operations. options(**read_config).option("query",

Data Processing

Data Processing Data Lake Data Warehouse Optimization

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics

AWS Big Data

NOVEMBER 20, 2023

Use case A typical workload for AWS Glue for Apache Spark jobs is to load data from a relational database to a data lake with SQL-based transformations. The end benefit for you is more effective and optimized AWS Glue for Apache Spark workloads. The metrics are available in all AWS Glue supported Regions. Check it out!

Metrics

Metrics Data Lake Cost-Benefit Dashboards

Steps Gerresheimer takes to transform its IT

CIO Business Intelligence

NOVEMBER 29, 2023

By mid-2023, Walldorf-based Gerresheimer had its IT strategy revised, and a central component of this was its cloud journey, for which CIO Zafer Nalbant and his team built a hybrid environment consisting of a public cloud part based on Microsoft Azure, and a private cloud part that runs in a data center completely managed by T-Systems.

IT

IT Data Lake Strategy IoT

CIOs press ahead for gen AI edge — despite misgivings

CIO Business Intelligence

OCTOBER 18, 2023

If anything, 2023 has proved to be a year of reckoning for businesses, and IT leaders in particular, as they attempt to come to grips with the disruptive potential of this technology — just as debates over the best path forward for AI have accelerated and regulatory uncertainty has cast a longer shadow over its outlook in the wake of these events.

Risk

Risk Manufacturing Enterprise Technology

How Cloudera Supports Zero Trust for Data

Cloudera

JUNE 7, 2023

Subsequent to the ZTMM release, CISA issued a request for comment, which has led to the revised version 2 of the ZTMM in April 2023 , as “commenters requested additional guidance and space to evolve along the maturity model,” according to CISA. How does Cloudera support the evolution to optimal?

Metadata

Metadata Data Lake Optimization Modeling

Accelerate your data warehouse migration to Amazon Redshift – Part 7

AWS Big Data

OCTOBER 17, 2023

Tens of thousands of customers use Amazon Redshift to gain business insights from their data. With Amazon Redshift, you can use standard SQL to query data across your data warehouse, operational data stores, and data lake. _cdc_unit" t2 WHERE t2.deletexid_ _cdc_unit" t2 WHERE t2.deletexid_

Data Warehouse

Data Warehouse Data Processing Data Lake Management

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics, Part 3: Visualization and trend analysis using Amazon QuickSight

AWS Big Data

MARCH 29, 2024

Analyzing historical patterns allows you to optimize performance, identify issues proactively, and improve planning. Looking at the Skewness Job per Job visualization, there was spike on November 1, 2023. We walk through ingesting CloudWatch metrics into QuickSight using a CloudWatch metric stream and QuickSight SPICE.

Metrics

Metrics Visualization Dashboards Interactive

DaVita’s technology strategy driven by the ‘power of purpose’

CIO Business Intelligence

DECEMBER 13, 2022

We’re looking at a variety of sources of data, putting it in data lakes, and then using that to drive predictive models that really help our doctors and our care teams to stratify our patient’s risk by taking actions at the right time.

Strategy

Strategy Technology Digital Transformation Data Lake

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

AWS Big Data

NOVEMBER 29, 2023

These announcements drive forward the AWS Zero-ETL vision to unify all your data, enabling you to better maximize the value of your data with comprehensive analytics and ML capabilities, and innovate faster with secure data collaboration within and across organizations.

Data Warehouse

Data Warehouse Data Lake Analytics Machine Learning

You’ve Got the Data. Why Can’t Your Developers Build with it?

CIO Business Intelligence

MAY 27, 2022

Developers who build the real-time experiences that customers love need a tech stack that lets them access data quickly and easily. IDC predicts that by 2023 there will be more than 500 million new cloud native digital apps and services – more than the total created over the past 40 years. Converge data at rest and data in motion.

IT

IT Data-driven Data Lake Enterprise

Improve operational efficiencies of Apache Iceberg tables built on Amazon S3 data lakes

AWS Big Data

MAY 24, 2023

When you build your transactional data lake using Apache Iceberg to solve your functional use cases, you need to focus on operational use cases for your S3 data lake to optimize the production environment. This property is set to true by default. availability.

Data Lake

Data Lake Snapshot Metadata Optimization

You’ve Got the Data. Why Can’t Your Developers Build with it?

CIO Business Intelligence

MAY 28, 2022

Developers who build the real-time experiences that customers love need a tech stack that lets them access data quickly and easily. IDC predicts that by 2023 there will be more than 500 million new cloud native digital apps and services – more than the total created over the past 40 years. Converge data at rest and data in motion.

IT

IT Data-driven Data Lake Enterprise

Top Opportunities for SAP Partners in 2023

Timo Elliott

NOVEMBER 30, 2022

My role was to talk about the trends and opportunities for 2023, for customers, SAP, and our partners. And it’s not just a technology vision — it’s also about how organizations have to rethink how they optimize business processes, business capabilities, and the business ecosystem. Business Process Optimization.

Recreation/Entertainment

Recreation/Entertainment Metadata Data Warehouse Cost-Benefit

Wolverine hits pause for cloud success

CIO Business Intelligence

JULY 8, 2022

To optimize business on its re-opening, Wolverine IT built supply chain data models using Microsoft Power BI to prioritize which brands it should manufacture first once factories resumed operation. Wolverine relies on seven data centers, two of which are run by third-party partners. We are not currently doing that.”.

Data Lake

Data Lake Manufacturing Digital Transformation Machine Learning

Introducing AWS Glue serverless Spark UI for better monitoring and troubleshooting

AWS Big Data

NOVEMBER 20, 2023

Customers often use Apache Spark Web UI , a popular debugging tool that is part of open source Apache Spark, to help fix problems and optimize job performance. Once logs are parsed, you can When logs are parsed, you can use the built-in Spark UI to debug, troubleshoot, and optimize your jobs. Now it’s time to run the job!

Visualization

Visualization Optimization Data Lake Management

Materialized Views in Hive for Iceberg Table Format

Cloudera

FEBRUARY 8, 2024

Queries containing joins, filters, projections, group-by, or aggregations without group-by can be transparently rewritten by the Hive optimizer to use one or more eligible materialized views. Materialized views can be partitioned on one or more columns. This can potentially lead to orders of magnitude improvement in performance.

Snapshot

Snapshot Metadata Cost-Benefit Data Warehouse

Implement a serverless CDC process with Apache Iceberg using Amazon DynamoDB and Amazon Athena

AWS Big Data

AUGUST 16, 2023

Iceberg manages large collections of files as tables, and it supports modern analytical data lake operations such as record-level insert, update, delete, and time travel queries. Iceberg also helps guarantee data correctness under concurrent write scenarios. We use a sample JSON file as input to Amazon DynamoDB.

Data Lake

Data Lake Metadata Testing Snapshot

Amazon QuickSight helps TalentReef empower its customers to make more informed hiring decisions

AWS Big Data

MARCH 17, 2023

The response has been overwhelmingly positive, leading to the development of two additional analytics dashboards, Job Postings and Onboarding, both set to be released in the first half of 2023. They want to see how their job postings are performing, if there is a drop in any posting, and opportunities to optimize their process.

Dashboards

Dashboards IT Data Lake Visualization

Getting started with AWS Glue Data Quality from the AWS Glue Data Catalog

AWS Big Data

JUNE 6, 2023

You can use AWS Glue to create, run, and monitor data integration and ETL (extract, transform, and load) pipelines and catalog your assets across multiple data stores. Hundreds of thousands of customers use data lakes for analytics and ML to make data-driven business decisions.

Data Quality

Data Quality Data Lake Data-driven Metrics

Tackling AI’s data challenges with IBM databases on AWS

IBM Big Data Hub

MARCH 14, 2024

This involves unifying and sharing a single copy of data and metadata across IBM® watsonx.data ™, IBM® Db2 ®, IBM® Db2® Warehouse and IBM® Netezza ®, using native integrations and supporting open formats, all without the need for migration or recataloging. With Netezza support for 1.2

Cost-Benefit

Cost-Benefit Metadata Optimization Management

Amazon Kinesis Data Streams: celebrating a decade of real-time data innovation

AWS Big Data

NOVEMBER 14, 2023

Ten years ago, we launched Amazon Kinesis Data Streams , the first cloud-native serverless streaming data service, to serve as the backbone for companies, to move data across system boundaries, breaking data silos. Another integration launched in 2023 is with Amazon Monitron to power predictive maintenance management.

IoT

IoT Data-driven Data Lake Data Strategy

Wonderla Holidays goes digital to enhance business and customer fun

CIO Business Intelligence

OCTOBER 18, 2022

One pulse sends 150 bytes of data. So, each band can send out 500KB to 750KB of data. To handle the huge volume of data thus generated, the company is in the process of deploying a data lake, data warehouse, and real-time analytical tools in a hybrid model. One band delivers multiple business benefits.

Data Lake

Data Lake Cost-Benefit Digital Transformation Data Warehouse

Exploring the AI and data capabilities of watsonx

IBM Big Data Hub

JULY 17, 2023

.” Sean Im, CEO, Samsung SDS America “In the field of generative AI and foundation models, watsonx is a platform that will enable us to meet our customers’ requirements in terms of optimization and security, while allowing them to benefit from the dynamism and innovations of the open-source community.”

Machine Learning

Machine Learning Data Warehouse Modeling Cost-Benefit

AWS Lake Formation 2023 year in review

AWS Big Data

JANUARY 18, 2024

AWS Lake Formation and the AWS Glue Data Catalog form an integral part of a data governance solution for data lakes built on Amazon Simple Storage Service (Amazon S3) with multiple AWS analytics services integrating with them. In 2023, we released several updates to AWS Glue crawlers. Crawlers, salut!

Data Lake

Data Lake Metadata Data Governance Statistics

Introducing watsonx: The future of AI for business

IBM Big Data Hub

MAY 9, 2023

A data store built on open lakehouse architecture, it runs both on premises and across multi-cloud environments. Optimized for all data, analytics and AI workloads, watsonx.data combines the flexibility of a data lake with the performance of a data warehouse, helping businesses to scale data analytics and AI anywhere their data resides.

Data Warehouse

Data Warehouse Cost-Benefit Machine Learning Modeling

6 ways to drive Wi-Fi operational efficiencies

CIO Business Intelligence

APRIL 18, 2023

Cloud-based network management increases agility and allows resource-constrained IT departments to focus on optimizing the network, not deploying, managing, or upgrading the network management system. To help take control in these uncertain times, this blog outlines six strategies to modernize your Wi-Fi. Networking

IoT

IoT Internet of Things Data Lake Optimization

Your guide to AWS Analytics at AWS re:Invent 2023

AWS Big Data

NOVEMBER 13, 2023

2023 AWS Analytics Superheroes We are excited to introduce the 2023 AWS Analytics Superheroes at this year’s re:Invent conference! A shapeshifting guardian and protector of data like Data Lynx? 11:30 AM – 12:30 PM (PDT) Ceasars Forum ANT318 | Accelerate innovation with end-to-end serverless data architecture.

Analytics

Analytics Data Lake Data Warehouse Data-driven

AWS re:Invent 2023 Amazon Redshift Sessions Recap

AWS Big Data

DECEMBER 18, 2023

Get a closer look at how scaling for data warehousing works in AWS with the latest introduction of AI driven scaling and optimizations in Amazon Redshift Serverless to enable better price-performance for your workloads.

Data Warehouse

Data Warehouse Machine Learning Data-driven Data Lake

Real-time streaming data top picks you cannot miss at AWS re:Invent 2023

AWS Big Data

NOVEMBER 8, 2023

Save the date: AWS re:Invent 2023 is happening from November 27 to December 1 in Las Vegas, and you cannot miss it. In today’s data-driven landscape, the quality of data is the foundation upon which the success of organizations and innovations stands. High-quality data is not just about accuracy; it’s also about timeliness.

Data-driven

Data-driven Data Lake Machine Learning Cost-Benefit

How Fujitsu implemented a global data mesh architecture and democratized data

AWS Big Data

MAY 1, 2024

Currently, we have approximately 120,000 employees worldwide (as of March 2023), including group companies. To provide a variety of products, services, and solutions that are better suited to customers and society in each region, we have built business processes and systems that are optimized for each region and its market.

Dashboards

Dashboards Data-driven Publishing Cost-Benefit

The Enduring Significance of Data Modeling in the Modern Data-Driven Enterprise

erwin

AUGUST 31, 2023

Improved Decision Making : Well-modeled data provides insights that drive informed decision-making across various business domains, resulting in enhanced strategic planning. Reduced Data Redundancy : By eliminating data duplication, it optimizes storage and enhances data quality, reducing errors and discrepancies.

Data-driven

Data-driven Modeling Enterprise Structured Data

5 Key Takeaways from Flink Forward 2023

Cloudera

NOVEMBER 27, 2023

Earlier this month (November 6 through 8, 2023) a few hundred Apache Flink enthusiasts descended upon a Hyatt Regency Lake near Seattle for the annual Flink Forward conference. This will help accelerate deployment across environments and to optimize performance and resource utilization on an ongoing basis. Takeaway No.

Data Lake

Data Lake Advertising ROI Data Warehouse

Unleashing the power of Presto: The Uber case study

IBM Big Data Hub

SEPTEMBER 25, 2023

With a few taps on a mobile device, riders request a ride; then, Uber’s algorithms work to match them with the nearest available driver and calculate the optimal price. Uber’s prowess as a transportation, logistics and analytics company hinges on their ability to leverage data effectively. But the simplicity ends there.

OLAP

OLAP Data Lake Data-driven Snapshot

Process price transparency data using AWS Glue

AWS Big Data

MAY 4, 2023

Prerequisites To implement the solution in your own AWS account, you need to create or configure the following AWS resources in advance: An S3 bucket to persist the source and processed data. getvalue(),encoding='utf-8') s3_client.put_object(Body=data, Bucket=bucket, Key=upload_path) s3_client = boto3.client('s3')

Insurance

Insurance Publishing Cost-Benefit Data Lake

Showpad accelerates data maturity to unlock innovation using Amazon QuickSight

AWS Big Data

APRIL 5, 2023

Showpad also struggled with data quality issues in terms of consistency, ownership, and insufficient data access across its targeted user base due to a complex BI access process, licensing challenges, and insufficient education. As of January 2023, Showpad’s QuickSight instance includes over 2,433 datasets and 199 dashboards.

Dashboards

Dashboards Reporting Cost-Benefit Visualization

Do the Benefits of Cloud Outweigh the Costs?

Jet Global

SEPTEMBER 19, 2023

What are the best practices for analyzing cloud ERP data? Data Management How do we create a data warehouse or data lake in the cloud using our cloud ERP? How do I access the legacy data from my previous ERP? Self-service BI How can we rapidly build BI reports on cloud ERP data without any help from IT?

Cost-Benefit

Cost-Benefit Data Warehouse Reporting Enterprise

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

Apache Iceberg optimization: Solving the small files problem in Amazon EMR

Webinars

Trending Sources

Use Apache Iceberg in a data lake to support incremental data processing

Webinars

Optimize data layout by bucketing with Amazon Athena and AWS Glue to accelerate downstream queries

Generative AI: 5 enterprise predictions for AI and security — for 2023, 2024, and beyond

Build a transactional data lake using Apache Iceberg, AWS Glue, and cross-account data shares using AWS Lake Formation and Amazon Athena

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

Use Amazon Athena with Spark SQL for your open-source transactional table formats

Simplifying data processing at Capitec with Amazon Redshift integration for Apache Spark

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics

Steps Gerresheimer takes to transform its IT

CIOs press ahead for gen AI edge — despite misgivings

How Cloudera Supports Zero Trust for Data

Accelerate your data warehouse migration to Amazon Redshift – Part 7

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics, Part 3: Visualization and trend analysis using Amazon QuickSight

DaVita’s technology strategy driven by the ‘power of purpose’

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

You’ve Got the Data. Why Can’t Your Developers Build with it?

Improve operational efficiencies of Apache Iceberg tables built on Amazon S3 data lakes

You’ve Got the Data. Why Can’t Your Developers Build with it?

Top Opportunities for SAP Partners in 2023

Wolverine hits pause for cloud success

Introducing AWS Glue serverless Spark UI for better monitoring and troubleshooting

Materialized Views in Hive for Iceberg Table Format

Implement a serverless CDC process with Apache Iceberg using Amazon DynamoDB and Amazon Athena

Amazon QuickSight helps TalentReef empower its customers to make more informed hiring decisions

Getting started with AWS Glue Data Quality from the AWS Glue Data Catalog

Tackling AI’s data challenges with IBM databases on AWS

Amazon Kinesis Data Streams: celebrating a decade of real-time data innovation

Wonderla Holidays goes digital to enhance business and customer fun

Exploring the AI and data capabilities of watsonx

AWS Lake Formation 2023 year in review

Introducing watsonx: The future of AI for business

6 ways to drive Wi-Fi operational efficiencies

Your guide to AWS Analytics at AWS re:Invent 2023

AWS re:Invent 2023 Amazon Redshift Sessions Recap

Real-time streaming data top picks you cannot miss at AWS re:Invent 2023

How Fujitsu implemented a global data mesh architecture and democratized data

The Enduring Significance of Data Modeling in the Modern Data-Driven Enterprise

5 Key Takeaways from Flink Forward 2023

Unleashing the power of Presto: The Uber case study

Process price transparency data using AWS Glue

Showpad accelerates data maturity to unlock innovation using Amazon QuickSight

Do the Benefits of Cloud Outweigh the Costs?

Stay Connected