Analytics, Data Lake, Data Transformation and IT

Analytics

Data Lake

Data Transformation

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. In this post, we describe Orca’s journey building a transactional data lake using Amazon Simple Storage Service (Amazon S3), Apache Iceberg, and AWS Analytics.

Data Lake

Data Lake Analytics Snapshot Optimization

Monitor data pipelines in a serverless data lake

AWS Big Data

AUGUST 9, 2023

The combination of a data lake in a serverless paradigm brings significant cost and performance benefits. By monitoring application logs, you can gain insights into job execution, troubleshoot issues promptly to ensure the overall health and reliability of data pipelines.

Data Lake

Data Lake Metrics Testing Cost-Benefit

Join 52,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

How to Optimize the Developer Experience for Monumental Impact

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

MORE WEBINARS

Trending Sources

Empower your Jira data in a data lake with Amazon AppFlow and AWS Glue

AWS Big Data

AUGUST 1, 2023

Although Jira Cloud provides reporting capability, loading this data into a data lake will facilitate enrichment with other business data, as well as support the use of business intelligence (BI) tools and artificial intelligence (AI) and machine learning (ML) applications. Search for the Jira Cloud connector.

Data Lake

Data Lake Data Transformation Cost-Benefit Data-driven

Webinars

How to Optimize the Developer Experience for Monumental Impact

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

MORE WEBINARS

Perform upserts in a data lake using Amazon Athena and Apache Iceberg

AWS Big Data

APRIL 27, 2023

Amazon Athena supports the MERGE command on Apache Iceberg tables, which allows you to perform inserts, updates, and deletes in your data lake at scale using familiar SQL statements that are compliant with ACID (Atomic, Consistent, Isolated, Durable). Most databases use a transaction log to record changes made to the database.

Data Lake

Data Lake Snapshot Optimization Data Transformation

Accelerate analytics on Amazon OpenSearch Service with AWS Glue through its native connector

AWS Big Data

DECEMBER 21, 2023

As the volume and complexity of analytics workloads continue to grow, customers are looking for more efficient and cost-effective ways to ingest and analyse data. OpenSearch Service is used for multiple purposes, such as observability, search analytics, consolidation, cost savings, compliance, and integration.

Analytics

Analytics IT Data Lake Visualization

How GamesKraft uses Amazon Redshift data sharing to support growing analytics workloads

AWS Big Data

NOVEMBER 13, 2023

Amazon Redshift is a fully managed data warehousing service that offers both provisioned and serverless options, making it more efficient to run and scale analytics without having to manage your data warehouse. These upstream data sources constitute the data producer components.

Data Warehouse

Data Warehouse Data Lake Analytics Data Science

Build a data lake with Apache Flink on Amazon EMR

AWS Big Data

JANUARY 27, 2023

To build a data-driven business, it is important to democratize enterprise data assets in a data catalog. With a unified data catalog, you can quickly search datasets and figure out data schema, data format, and location. The Amazon EMR Flink CDC connector reads the binlog data and processes the data.

Data Lake

Data Lake Metadata Business Analysis Data-driven

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

AWS Big Data

DECEMBER 13, 2023

In addition to using native managed AWS services that BMS didn’t need to worry about upgrading, BMS was looking to offer an ETL service to non-technical business users that could visually compose data transformation workflows and seamlessly run them on the AWS Glue Apache Spark-based serverless data integration engine.

Metadata

Metadata Data Lake Visualization Data Transformation

Introducing Amazon Q data integration in AWS Glue

AWS Big Data

APRIL 30, 2024

This reduces the time and effort you need to learn, build, and run data integration jobs using AWS Glue data integration engines. For example, you can ask Amazon Q Developer to generate a complete extract, transform, and load (ETL) script or code snippet for individual ETL operations.

Data Integration

Data Integration Data Lake Data Warehouse Software

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

AWS Big Data

NOVEMBER 29, 2023

dbt is an open source, SQL-first templating engine that allows you to write repeatable and extensible data transforms in Python and SQL. dbt is predominantly used by data warehouses (such as Amazon Redshift ) customers who are looking to keep their data transform logic separate from storage and engine.

Data Lake

Data Lake Management Metrics Data Warehouse

Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation

AWS Big Data

JANUARY 17, 2024

With Amazon EMR 6.15, we launched AWS Lake Formation based fine-grained access controls (FGAC) on Open Table Formats (OTFs), including Apache Hudi, Apache Iceberg, and Delta lake. Many large enterprise companies seek to use their transactional data lake to gain insights and improve decision-making.

Data Lake

Data Lake Snapshot Big Data Data-driven

Unlock scalable analytics with AWS Glue and Google BigQuery

AWS Big Data

OCTOBER 27, 2023

Data integration is the foundation of robust data analytics. It encompasses the discovery, preparation, and composition of data from diverse sources. In the modern data landscape, accessing, integrating, and transforming data from diverse sources is a vital process for data-driven decision-making.

Analytics

Analytics Visualization Data Integration Cost-Benefit

Migrate from Amazon Kinesis Data Analytics for SQL Applications to Amazon Kinesis Data Analytics Studio

AWS Big Data

JUNE 29, 2023

Amazon Kinesis Data Analytics makes it easy to transform and analyze streaming data in real time. In this post, we discuss why AWS recommends moving from Kinesis Data Analytics for SQL Applications to Amazon Kinesis Data Analytics for Apache Flink to take advantage of Apache Flink’s advanced streaming capabilities.

Data Analytics

Data Analytics Analytics IoT Data Lake

How smava makes loans transparent and affordable using Amazon Redshift Serverless

AWS Big Data

DECEMBER 21, 2023

The data volume is in double-digit TBs with steady growth as business and data sources evolve. smava’s Data Platform team faced the challenge to deliver data to stakeholders with different SLAs, while maintaining the flexibility to scale up and down while staying cost-efficient.

Data Lake

Data Lake Data Warehouse Data-driven B2B

How HR&A uses Amazon Redshift spatial analytics on Amazon Redshift Serverless to measure digital equity in states across the US

AWS Big Data

DECEMBER 5, 2023

This frees up our local computer space, greatly automates the survey cleaning and analysis step, and allows our clients to easily access the data results. This cut down significantly on analytical turnaround times. The first step in this process is mapping the digital divide.

Measurement

Measurement Dashboards Data Warehouse Analytics

Straumann Group is transforming dentistry with data, AI

CIO Business Intelligence

FEBRUARY 16, 2023

The company’s orthodontics business, for instance, makes heavy use of image processing to the point that unstructured data is growing at a pace of roughly 20% to 25% per month. For example, imaging data can be used to show patients how an aligner will change their appearance over time. “It The offensive side?

Unstructured Data

Unstructured Data Data Lake Prescriptive Analytics Digital Transformation

How the BMW Group analyses semiconductor demand with AWS Glue

AWS Big Data

APRIL 26, 2023

In 2019, the BMW Group decided to re-architect and move its on-premises data lake to the AWS Cloud to enable data-driven innovation while scaling with the dynamic needs of the organization. To learn more about the Cloud Data Hub, refer to BMW Group Uses AWS-Based Data Lake to Unlock the Power of Data.

Forecasting

Forecasting Manufacturing Data Lake Big Data

Scale your AWS Glue for Apache Spark jobs with new larger worker types G.4X and G.8X

AWS Big Data

MAY 9, 2023

Hundreds of thousands of customers use AWS Glue , a serverless data integration service, to discover, prepare, and combine data for analytics, machine learning (ML), and application development. AWS Glue for Apache Spark jobs work with your code and configuration of the number of data processing units (DPU). Generally, G.2X

Data Lake

Data Lake Cost-Benefit Data Integration Data Transformation

Data Mesh 101: How Data Mesh Helps Organizations Be Data-Driven and Achieve Velocity

Ontotext

FEBRUARY 12, 2024

For large enterprises, data mesh distributes data ownership and reduces dependencies between services. This promotes data autonomy and enables decision-making about data domains without centralized gatekeepers. What does data mesh do that other approaches can’t? What does data mesh do that other approaches can’t?

Data-driven

Data-driven Data Lake Data Quality Business Objectives

Create a modern data platform using the Data Build Tool (dbt) in the AWS Cloud

AWS Big Data

NOVEMBER 9, 2023

Building a data platform involves various approaches, each with its unique blend of complexities and solutions. In this post, we delve into a case study for a retail use case, exploring how the Data Build Tool (dbt) was used effectively within an AWS environment to build a high-performing, efficient, and modern data platform.

Data Warehouse

Data Warehouse Testing Data Quality Reporting

Self-Serve Data Prep CAN Be Easy AND Sophisticated!

Smarten

AUGUST 4, 2023

The Right Self-Serve Data Preparation Solution is Sophisticated, Easy-to-Use and Ensures User Adoption! When your enterprise decides to roll out analytics for business users, it is important to implement the right solution. Sophisticated Functionality – Don’t sacrifice functionality to get ease-of-use.

Data Lake

Data Lake Machine Learning Data Integration Optimization

Power enterprise-grade Data Vaults with Amazon Redshift – Part 1

AWS Big Data

NOVEMBER 16, 2023

Amazon Redshift is a popular cloud data warehouse, offering a fully managed cloud-based service that seamlessly integrates with an organization’s Amazon Simple Storage Service (Amazon S3) data lake, real-time streams, machine learning (ML) workflows, transactional workflows, and much more—all while providing up to 7.9x

Enterprise

Enterprise Data Warehouse Data Lake Optimization

Reference guide to build inventory management and forecasting solutions on AWS

AWS Big Data

APRIL 11, 2023

In this post, we discuss how to streamline inventory management forecasting systems with AWS managed analytics, AI/ML, and database services. With the proliferation of IoT devices and the abundance of data generated by them, it has become possible to collect real-time data on inventory levels, customer behavior, and other key metrics.

Forecasting

Forecasting Management IoT Data-driven

Optimize data layout by bucketing with Amazon Athena and AWS Glue to accelerate downstream queries

AWS Big Data

APRIL 25, 2024

In the era of data, organizations are increasingly using data lakes to store and analyze vast amounts of structured and unstructured data. Data lakes provide a centralized repository for data from various sources, enabling organizations to unlock valuable insights and drive data-driven decision-making.

Optimization

Optimization Data Lake Cost-Benefit Reporting

Introducing the AWS ProServe Hadoop Migration Delivery Kit TCO tool

AWS Big Data

FEBRUARY 6, 2023

Use case overview Migrating Hadoop workloads to Amazon EMR accelerates big data analytics modernization, increases productivity, and reduces operational cost. Refactoring coupled compute and storage to a decoupling architecture is a modern data solution. Jiseong Kim is a Senior Data Architect at AWS ProServe.

Cost-Benefit

Cost-Benefit Data Lake Dashboards Big Data

How Tricentis unlocks insights across the software development lifecycle at speed and scale using Amazon Redshift

AWS Big Data

MARCH 3, 2023

In this post, we share how the AWS Data Lab helped Tricentis to improve their software as a service (SaaS) Tricentis Analytics platform with insights powered by Amazon Redshift. Although Tricentis has amassed such data over a decade, the data remains untapped for valuable insights.

Software

Software Data Lake Testing Cost-Benefit

Happy Birthday, CDP Public Cloud

Cloudera

OCTOBER 13, 2020

CDP Data Hub: a VM/Instance-based service that allows IT and developers to build custom business applications for a diverse set of use cases with secure, self-service access to enterprise data. . Enrich – Data Engineering (Apache Spark and Apache Hive). Predict – Data Engineering (Apache Spark). New Services.

Data Warehouse

Data Warehouse Machine Learning Visualization Data Lake

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

AWS Big Data

MARCH 3, 2023

Building data lakes from continuously changing transactional data of databases and keeping data lakes up to date is a complex task and can be an operational challenge. You can then apply transformations and store data in Delta format for managing inserts, updates, and deletes.

Data Lake

Data Lake Dashboards Metrics Metadata

Deep dive into the AWS ProServe Hadoop Migration Delivery Kit TCO tool

AWS Big Data

FEBRUARY 6, 2023

The YARN log analyzer provides four key functionalities: Upload transformed YARN job history logs in CSV format (for example, cluster_yarn_logs_*.csv He helps customers innovate their business with AWS Analytics, IoT, and AI/ML services. Jiseong Kim is a Senior Data Architect at AWS ProServe.

Dashboards

Dashboards Optimization Data Lake Cost-Benefit

How to use foundation models and trusted governance to manage AI workflow risk

IBM Big Data Hub

OCTOBER 16, 2023

It includes processes that trace and document the origin of data, models and associated metadata and pipelines for audits. In other words, instead of training numerous models on labeled, task-specific data, it’s now possible to pre-train one big model built on a transformer and then, with additional fine-tuning, reuse it as needed.

Risk

Risk Modeling Management Metadata

Automate alerting and reporting for AWS Glue job resource usage

AWS Big Data

MAY 25, 2023

Data transformation plays a pivotal role in providing the necessary data insights for businesses in any organization, small and large. To gain these insights, customers often perform ETL (extract, transform, and load) jobs from their source systems and output an enriched dataset.

Reporting

Reporting Metrics Optimization Data Lake

An AI Chat Bot Wrote This Blog Post …

DataKitchen

DECEMBER 9, 2022

ChatGPT> DataOps, or data operations, is a set of practices and technologies that organizations use to improve the speed, quality, and reliability of their data analytics processes. Overall, DataOps is an essential component of modern data-driven organizations. Query> Write an essay on DataOps Observability.

Machine Learning

Machine Learning Data-driven Optimization Modeling

7 key Microsoft Azure analytics services (plus one extra)

CIO Business Intelligence

JUNE 29, 2022

Insights hidden in your data are essential for optimizing business operations, finetuning your customer experience, and developing new products — or new lines of business, like predictive maintenance. And as businesses contend with increasingly large amounts of data, the cloud is fast becoming the logical place where analytics work gets done.

Data Lake

Data Lake Analytics Data Warehouse Machine Learning

Turning the page

Cloudera

JUNE 1, 2021

This means we can double down on our strategy – continuing to win the Hybrid Data Cloud battle in the IT department AND building new, easy-to-use cloud solutions for the line of business. It also means we can complete our business transformation with the systems, processes and people that support a new operating model. .

Uncertainty

Uncertainty Cost-Benefit Risk Strategy

Turnkey Cloud DataOps: Solution from Alation and Accenture

Alation

MARCH 22, 2022

As the latest iteration in this pursuit of high-quality data sharing, DataOps combines a range of disciplines. It synthesizes all we’ve learned about agile, data quality , and ETL/ELT. Simply put, IDF standardizes data engineering processes. They can better understand data transformations, checks, and normalization.

Metadata

Metadata Cost-Benefit Data Quality Data Lake

Use fuzzy string matching to approximate duplicate records in Amazon Redshift

AWS Big Data

FEBRUARY 8, 2023

Amazon Redshift enables you to run complex SQL analytics at scale and performance on terabytes to petabytes of structured and unstructured data, and make the insights widely available through popular business intelligence (BI) and analytics tools. Answering questions as simple as “How many unique customers do we have?”

Data Quality

Data Quality Testing Data Warehouse Unstructured Data

Showpad accelerates data maturity to unlock innovation using Amazon QuickSight

AWS Big Data

APRIL 5, 2023

In 2021, Showpad set forth the vision to use the power of data to unlock innovations and drive business decisions across its organization. The company decided to use AWS to unify its business intelligence (BI) and reporting strategy for both internal organization-wide use cases and in-product embedded analytics targeted at its customers.

Dashboards

Dashboards Reporting Cost-Benefit Visualization

Texas Rangers data transformation modernizes stadium operations

CIO Business Intelligence

OCTOBER 18, 2022

The old stadium, which opened in 1992, provided the business operations team with data, but that data came from disparate sources, many of which were not consistently updated. The new Globe Life Field not only boasts a retractable roof, but it produces data in categories that didn’t even exist in 1992.

Data Transformation

Data Transformation Consulting Data Lake Reporting

The Ten Standard Tools To Develop Data Pipelines In Microsoft Azure

DataKitchen

JULY 27, 2023

While working in Azure with our customers, we have noticed several standard Azure tools people use to develop data pipelines and ETL or ELT processes. We counted ten ‘standard’ ways to transform and set up batch data pipelines in Microsoft Azure. You can use it for big data analytics and machine learning workloads.

Machine Learning

Machine Learning Cost-Benefit Data Transformation Testing

Data platform trinity: Competitive or complementary?

IBM Big Data Hub

JANUARY 18, 2023

A read-optimized platform that can integrate data from multiple applications emerged. In another decade, the internet and mobile started the generate data of unforeseen volume, variety and velocity. It required a different data platform solution. Value of the data projects are difficult to realize. It was Datawarehouse.

Data Lake

Data Lake Data Warehouse Data-driven Metadata

Cloudera’s Open Data Lakehouse Supercharged with dbt Core(tm)

Cloudera

OCTOBER 7, 2022

Using these adapters, Cloudera customers can use dbt to collaborate, test, deploy, and document their data transformation and analytic pipelines on CDP Public Cloud, CDP One, and CDP Private Cloud. The Open Data Lakehouse . Cloudera builds dbt adaptors for all engines in the open data lakehouse.

Data Warehouse

Data Warehouse Data Transformation Testing Data Lake

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

Cloudera

JUNE 30, 2022

These tools empower analysts and data scientists to easily collaborate on the same data, with their choice of tools and analytic engines. No more lock-in, unnecessary data transformations, or data movement across tools and clouds just to extract insights out of the data.

Data Lake

Data Lake Data Architecture Metadata Data Warehouse

Tackling AI’s data challenges with IBM databases on AWS

IBM Big Data Hub

MARCH 14, 2024

The solution: IBM databases on AWS To solve for these challenges, IBM’s portfolio of SaaS database solutions on Amazon Web Services (AWS), enables enterprises to scale applications, analytics and AI across the hybrid cloud landscape.

Cost-Benefit

Cost-Benefit Metadata Optimization Management

Exploring the AI and data capabilities of watsonx

IBM Big Data Hub

JULY 17, 2023

By supporting open-source frameworks and tools for code-based, automated and visual data science capabilities — all in a secure, trusted studio environment — we’re already seeing excitement from companies ready to use both foundation models and machine learning to accomplish key tasks. What is watsonx.data?

Machine Learning

Machine Learning Data Warehouse Modeling Cost-Benefit

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Monitor data pipelines in a serverless data lake

Webinars

Trending Sources

Empower your Jira data in a data lake with Amazon AppFlow and AWS Glue

Webinars

Perform upserts in a data lake using Amazon Athena and Apache Iceberg

Accelerate analytics on Amazon OpenSearch Service with AWS Glue through its native connector

How GamesKraft uses Amazon Redshift data sharing to support growing analytics workloads

Build a data lake with Apache Flink on Amazon EMR

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

Introducing Amazon Q data integration in AWS Glue

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation

Unlock scalable analytics with AWS Glue and Google BigQuery

Migrate from Amazon Kinesis Data Analytics for SQL Applications to Amazon Kinesis Data Analytics Studio

How smava makes loans transparent and affordable using Amazon Redshift Serverless

How HR&A uses Amazon Redshift spatial analytics on Amazon Redshift Serverless to measure digital equity in states across the US

Straumann Group is transforming dentistry with data, AI

How the BMW Group analyses semiconductor demand with AWS Glue

Scale your AWS Glue for Apache Spark jobs with new larger worker types G.4X and G.8X

Data Mesh 101: How Data Mesh Helps Organizations Be Data-Driven and Achieve Velocity

Create a modern data platform using the Data Build Tool (dbt) in the AWS Cloud

Self-Serve Data Prep CAN Be Easy AND Sophisticated!

Power enterprise-grade Data Vaults with Amazon Redshift – Part 1

Reference guide to build inventory management and forecasting solutions on AWS

Optimize data layout by bucketing with Amazon Athena and AWS Glue to accelerate downstream queries

Introducing the AWS ProServe Hadoop Migration Delivery Kit TCO tool

How Tricentis unlocks insights across the software development lifecycle at speed and scale using Amazon Redshift

Happy Birthday, CDP Public Cloud

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

Deep dive into the AWS ProServe Hadoop Migration Delivery Kit TCO tool

How to use foundation models and trusted governance to manage AI workflow risk

Automate alerting and reporting for AWS Glue job resource usage

An AI Chat Bot Wrote This Blog Post …

7 key Microsoft Azure analytics services (plus one extra)

Turning the page

Turnkey Cloud DataOps: Solution from Alation and Accenture

­­Use fuzzy string matching to approximate duplicate records in Amazon Redshift

Showpad accelerates data maturity to unlock innovation using Amazon QuickSight

Texas Rangers data transformation modernizes stadium operations

The Ten Standard Tools To Develop Data Pipelines In Microsoft Azure

Data platform trinity: Competitive or complementary?

Cloudera’s Open Data Lakehouse Supercharged with dbt Core(tm)

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

Tackling AI’s data challenges with IBM databases on AWS

Exploring the AI and data capabilities of watsonx

Stay Connected

Use fuzzy string matching to approximate duplicate records in Amazon Redshift