Analytics, Blog, Data Lake and Data Transformation

Analytics

Blog

Data Lake

Data Transformation

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. In this post, we describe Orca’s journey building a transactional data lake using Amazon Simple Storage Service (Amazon S3), Apache Iceberg, and AWS Analytics.

Data Lake

Data Lake Analytics Snapshot Optimization

Empower your Jira data in a data lake with Amazon AppFlow and AWS Glue

AWS Big Data

AUGUST 1, 2023

Although Jira Cloud provides reporting capability, loading this data into a data lake will facilitate enrichment with other business data, as well as support the use of business intelligence (BI) tools and artificial intelligence (AI) and machine learning (ML) applications. For InitialRunFlag , choose Setup.

Data Lake

Data Lake Data Transformation Cost-Benefit Data-driven

Join 52,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

How To Get Promoted In Product Management

MORE WEBINARS

Trending Sources

Analytics Vidhya

How to modernize data lakes with a data lakehouse architecture

IBM Big Data Hub

JULY 5, 2023

Data Lakes have been around for well over a decade now, supporting the analytic operations of some of the largest world corporations. Such data volumes are not easy to move, migrate or modernize. The challenges of a monolithic data lake architecture Data lakes are, at a high level, single repositories of data at scale.

Data Lake

Data Lake Metadata Cost-Benefit Data Warehouse

Webinars

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

How To Get Promoted In Product Management

MORE WEBINARS

An AI Chat Bot Wrote This Blog Post …

DataKitchen

DECEMBER 9, 2022

ChatGPT> DataOps, or data operations, is a set of practices and technologies that organizations use to improve the speed, quality, and reliability of their data analytics processes. Overall, DataOps is an essential component of modern data-driven organizations. Query> DataOps. Query> Write an essay on DataOps.

Machine Learning

Machine Learning Data-driven Optimization Modeling

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

AWS Big Data

NOVEMBER 29, 2023

dbt is an open source, SQL-first templating engine that allows you to write repeatable and extensible data transforms in Python and SQL. dbt is predominantly used by data warehouses (such as Amazon Redshift ) customers who are looking to keep their data transform logic separate from storage and engine.

Data Lake

Data Lake Management Metrics Data Warehouse

Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation

AWS Big Data

JANUARY 17, 2024

With Amazon EMR 6.15, we launched AWS Lake Formation based fine-grained access controls (FGAC) on Open Table Formats (OTFs), including Apache Hudi, Apache Iceberg, and Delta lake. Many large enterprise companies seek to use their transactional data lake to gain insights and improve decision-making.

Data Lake

Data Lake Snapshot Big Data Data-driven

Unlock scalable analytics with AWS Glue and Google BigQuery

AWS Big Data

OCTOBER 27, 2023

Data integration is the foundation of robust data analytics. It encompasses the discovery, preparation, and composition of data from diverse sources. In the modern data landscape, accessing, integrating, and transforming data from diverse sources is a vital process for data-driven decision-making.

Analytics

Analytics Visualization Data Integration Cost-Benefit

The Ten Standard Tools To Develop Data Pipelines In Microsoft Azure

DataKitchen

JULY 27, 2023

Let’s go through the ten Azure data pipeline tools Azure Data Factory : This cloud-based data integration service allows you to create data-driven workflows for orchestrating and automating data movement and transformation. You can use it for big data analytics and machine learning workloads.

Machine Learning

Machine Learning Cost-Benefit Data Transformation Testing

The disruptive potential of open data lakehouse architectures and IBM watsonx.data

IBM Big Data Hub

JUNE 15, 2023

The proliferation of data silos also inhibits the unification and enrichment of data which is essential to unlocking the new insights. Moreover, increased regulatory requirements make it harder for enterprises to democratize data access and scale the adoption of analytics and artificial intelligence (AI).

Data Warehouse

Data Warehouse Data Lake Optimization Data-driven

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

Cloudera

JUNE 30, 2022

These tools empower analysts and data scientists to easily collaborate on the same data, with their choice of tools and analytic engines. No more lock-in, unnecessary data transformations, or data movement across tools and clouds just to extract insights out of the data.

Data Lake

Data Lake Data Architecture Metadata Data Warehouse

Self-Serve Data Prep CAN Be Easy AND Sophisticated!

Smarten

AUGUST 4, 2023

The Right Self-Serve Data Preparation Solution is Sophisticated, Easy-to-Use and Ensures User Adoption! When your enterprise decides to roll out analytics for business users, it is important to implement the right solution. Sophisticated Functionality – Don’t sacrifice functionality to get ease-of-use.

Data Lake

Data Lake Machine Learning Data Integration Optimization

Data platform trinity: Competitive or complementary?

IBM Big Data Hub

JANUARY 18, 2023

In another decade, the internet and mobile started the generate data of unforeseen volume, variety and velocity. It required a different data platform solution. Hence, Data Lake emerged, which handles unstructured and structured data with huge volume. Data lakehouse was created to solve these problems.

Data Lake

Data Lake Data Warehouse Data-driven Metadata

Connecting the Data Lifecycle

Cloudera

NOVEMBER 29, 2021

Data transforms businesses. That’s where the data lifecycle comes into play. Managing data and its flow, from the edge to the cloud, is one of the most important tasks in the process of gaining data intelligence. . The firm also worked on creating a solid pipeline from the data warehouse to the data lake.

Data Lake

Data Lake Data Warehouse Data Architecture Reporting

Cloudera’s Open Data Lakehouse Supercharged with dbt Core(tm)

Cloudera

OCTOBER 7, 2022

Using these adapters, Cloudera customers can use dbt to collaborate, test, deploy, and document their data transformation and analytic pipelines on CDP Public Cloud, CDP One, and CDP Private Cloud. The Open Data Lakehouse . Cloudera builds dbt adaptors for all engines in the open data lakehouse. dbt-impala .

Data Warehouse

Data Warehouse Data Transformation Testing Data Lake

Data Mesh 101: How Data Mesh Helps Organizations Be Data-Driven and Achieve Velocity

Ontotext

FEBRUARY 12, 2024

Transferring ownership of data/datasets to domain-specific units that possess a deeper understanding of rules around the data empowers teams, improves data quality and trust, and greatly accelerates the building of data models and analytics. However, data mesh is not about introducing new technologies.

Data-driven

Data-driven Data Lake Data Quality Business Objectives

Accelerate Your Data Mesh in the Cloud with Cloudera Data Engineering and Modak NabuTM

Cloudera

OCTOBER 11, 2021

Modak’s Nabu is a born in the cloud, cloud-neutral integrated data engineering platform designed to accelerate the journey of enterprises to the cloud. Modak empowers organizations to maximize their ROI from existing analytics infrastructure through interoperability. Modak Nabu TM and CDE’s Spark-on-Kubernetes.

Data Lake

Data Lake Cost-Benefit Data-driven Dashboards

Happy Birthday, CDP Public Cloud

Cloudera

OCTOBER 13, 2020

CDP Data Hub: a VM/Instance-based service that allows IT and developers to build custom business applications for a diverse set of use cases with secure, self-service access to enterprise data. . Enrich – Data Engineering (Apache Spark and Apache Hive). Predict – Data Engineering (Apache Spark). This is Now.

Data Warehouse

Data Warehouse Machine Learning Visualization Data Lake

Tackling AI’s data challenges with IBM databases on AWS

IBM Big Data Hub

MARCH 14, 2024

The solution: IBM databases on AWS To solve for these challenges, IBM’s portfolio of SaaS database solutions on Amazon Web Services (AWS), enables enterprises to scale applications, analytics and AI across the hybrid cloud landscape. It enables secure data sharing for analytics and AI across your ecosystem.

Cost-Benefit

Cost-Benefit Metadata Optimization Management

Exploring the AI and data capabilities of watsonx

IBM Big Data Hub

JULY 17, 2023

In this blog, I will cover: What is watsonx.ai? Capabilities within the Prompt Lab include: Summarize: Transform text with domain-specific content into personalized overviews and capture key points (e.g., AMC Networks is excited by the opportunity to capitalize on the value of all of their data to improve viewer experiences.

Machine Learning

Machine Learning Data Warehouse Modeling Cost-Benefit

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

AWS Big Data

MARCH 3, 2023

Building data lakes from continuously changing transactional data of databases and keeping data lakes up to date is a complex task and can be an operational challenge. You can then apply transformations and store data in Delta format for managing inserts, updates, and deletes. For Type , choose Spark.

Data Lake

Data Lake Dashboards Metrics Metadata

Automate alerting and reporting for AWS Glue job resource usage

AWS Big Data

MAY 25, 2023

Data transformation plays a pivotal role in providing the necessary data insights for businesses in any organization, small and large. To gain these insights, customers often perform ETL (extract, transform, and load) jobs from their source systems and output an enriched dataset.

Reporting

Reporting Metrics Optimization Data Lake

How to use foundation models and trusted governance to manage AI workflow risk

IBM Big Data Hub

OCTOBER 16, 2023

How to scale AL and ML with built-in governance A fit-for-purpose data store built on an open lakehouse architecture allows you to scale AI and ML while providing built-in governance tools. A data store lets a business connect existing data with new data and discover new insights with real-time analytics and business intelligence.

Risk

Risk Modeling Management Metadata

Addressing the Three Scalability Challenges in Modern Data Platforms

Cloudera

NOVEMBER 22, 2021

In legacy analytical systems such as enterprise data warehouses, the scalability challenges of a system were primarily associated with computational scalability, i.e., the ability of a data platform to handle larger volumes of data in an agile and cost-efficient way. Introduction. CRM platforms).

Data Processing

Data Processing Data Warehouse Enterprise Visualization

Turnkey Cloud DataOps: Solution from Alation and Accenture

Alation

MARCH 22, 2022

Accenture calls it the Intelligent Data Foundation (IDF), and it’s used by dozens of enterprises with very complex data landscapes and analytic requirements. Simply put, IDF standardizes data engineering processes. They can better understand data transformations, checks, and normalization.

Metadata

Metadata Cost-Benefit Data Quality Data Lake

Turning the page

Cloudera

JUNE 1, 2021

Everyone from Snowflake and DataBricks to Google and Microsoft claim to have one, but the truth is that we are leading the way in the hybrid data cloud space. Integration between lifecycle analytic functions matters. The mission is to “Make data and analytics easy and accessible, for everyone.” Hybrid cloud matters.

Uncertainty

Uncertainty Cost-Benefit Risk Strategy

Use fuzzy string matching to approximate duplicate records in Amazon Redshift

AWS Big Data

FEBRUARY 8, 2023

Amazon Redshift enables you to run complex SQL analytics at scale and performance on terabytes to petabytes of structured and unstructured data, and make the insights widely available through popular business intelligence (BI) and analytics tools. Answering questions as simple as “How many unique customers do we have?”

Data Quality

Data Quality Testing Data Warehouse Unstructured Data

Fabrics, Meshes & Stacks, oh my! Q&A with Sanjeev Mohan

Alation

AUGUST 11, 2022

The data warehouse and analytical data stores moved to the cloud and disaggregated into the data mesh. Today, the brightest minds in our industry are targeting the massive proliferation of data volumes and the accompanying but hard-to-find value locked within all that data. Subscribe to Alation's Blog.

Metadata

Metadata Data Warehouse Data Quality Data Lake

Building Better Data Models to Unlock Next-Level Intelligence

Sisense

MAY 11, 2021

You can’t talk about data analytics without talking about data modeling. These two functions are nearly inseparable as we move further into a world of analytics that blends sources of varying volume, variety, veracity, and velocity. But this was only the tip of the analytics iceberg.

Modeling

Modeling Big Data IoT Data Warehouse

Data Leaders Brief

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Empower your Jira data in a data lake with Amazon AppFlow and AWS Glue

Webinars

Trending Sources

How to modernize data lakes with a data lakehouse architecture

Webinars

An AI Chat Bot Wrote This Blog Post …

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation

Unlock scalable analytics with AWS Glue and Google BigQuery

The Ten Standard Tools To Develop Data Pipelines In Microsoft Azure

The disruptive potential of open data lakehouse architectures and IBM watsonx.data

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

Self-Serve Data Prep CAN Be Easy AND Sophisticated!

Data platform trinity: Competitive or complementary?

Connecting the Data Lifecycle

Cloudera’s Open Data Lakehouse Supercharged with dbt Core(tm)

Data Mesh 101: How Data Mesh Helps Organizations Be Data-Driven and Achieve Velocity

Accelerate Your Data Mesh in the Cloud with Cloudera Data Engineering and Modak NabuTM

Happy Birthday, CDP Public Cloud

Tackling AI’s data challenges with IBM databases on AWS

Exploring the AI and data capabilities of watsonx

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

Automate alerting and reporting for AWS Glue job resource usage

How to use foundation models and trusted governance to manage AI workflow risk

Addressing the Three Scalability Challenges in Modern Data Platforms

Turnkey Cloud DataOps: Solution from Alation and Accenture

Turning the page

Use fuzzy string matching to approximate duplicate records in Amazon Redshift

Fabrics, Meshes & Stacks, oh my! Q&A with Sanjeev Mohan

Building Better Data Models to Unlock Next-Level Intelligence

Stay Connected

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Empower your Jira data in a data lake with Amazon AppFlow and AWS Glue

Webinars

Trending Sources

How to modernize data lakes with a data lakehouse architecture

Webinars

An AI Chat Bot Wrote This Blog Post …

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation

Unlock scalable analytics with AWS Glue and Google BigQuery

The Ten Standard Tools To Develop Data Pipelines In Microsoft Azure

The disruptive potential of open data lakehouse architectures and IBM watsonx.data

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

Self-Serve Data Prep CAN Be Easy AND Sophisticated!

Data platform trinity: Competitive or complementary?

Connecting the Data Lifecycle

Cloudera’s Open Data Lakehouse Supercharged with dbt Core(tm)

Data Mesh 101: How Data Mesh Helps Organizations Be Data-Driven and Achieve Velocity

Accelerate Your Data Mesh in the Cloud with Cloudera Data Engineering and Modak NabuTM

Happy Birthday, CDP Public Cloud

Tackling AI’s data challenges with IBM databases on AWS

Exploring the AI and data capabilities of watsonx

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

Automate alerting and reporting for AWS Glue job resource usage

How to use foundation models and trusted governance to manage AI workflow risk

Addressing the Three Scalability Challenges in Modern Data Platforms

Turnkey Cloud DataOps: Solution from Alation and Accenture

Turning the page

­­Use fuzzy string matching to approximate duplicate records in Amazon Redshift

Fabrics, Meshes & Stacks, oh my! Q&A with Sanjeev Mohan

Building Better Data Models to Unlock Next-Level Intelligence

Stay Connected

Use fuzzy string matching to approximate duplicate records in Amazon Redshift