Data Lake, Data Processing and Data-driven

Data Lake

Data Processing

Data-driven

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

AWS Big Data

MARCH 7, 2024

Organizations often need to manage a high volume of data that is growing at an extraordinary rate. At the same time, they need to optimize operational costs to unlock the value of this data for timely insights and do so with a consistent performance. We think of this concept as inside-out data movement. Example Corp.

Data Lake

Data Lake Analytics Dashboards Metrics

DS Smith sets a single-cloud agenda for sustainability

CIO Business Intelligence

DECEMBER 6, 2023

Much of our digital agenda is around data. Its digital transformation began with an application modernization phase, in which Dickson and her IT teams determined which applications should be hosted in the public cloud and which should remain on a private cloud. Before we were quite fragmented across different technologies.

Manufacturing

Manufacturing Data Lake Digital Transformation Machine Learning

Join 52,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

How to Optimize the Developer Experience for Monumental Impact

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

MORE WEBINARS

Trending Sources

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

AWS Big Data

DECEMBER 13, 2023

In addition to using native managed AWS services that BMS didn’t need to worry about upgrading, BMS was looking to offer an ETL service to non-technical business users that could visually compose data transformation workflows and seamlessly run them on the AWS Glue Apache Spark-based serverless data integration engine.

Metadata

Metadata Data Lake Visualization Data Transformation

Webinars

How to Optimize the Developer Experience for Monumental Impact

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

MORE WEBINARS

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics, Part 3: Visualization and trend analysis using Amazon QuickSight

AWS Big Data

MARCH 29, 2024

QuickSight makes it straightforward for business users to visualize data in interactive dashboards and reports. You can slice data by different dimensions like job name, see anomalies, and share reports securely across your organization. With these insights, teams have the visibility to make data integration pipelines more efficient.

Metrics

Metrics Visualization Dashboards Interactive

Enhance query performance using AWS Glue Data Catalog column-level statistics

AWS Big Data

NOVEMBER 22, 2023

Today, we’re making available a new capability of AWS Glue Data Catalog that allows generating column-level statistics for AWS Glue tables. Data lakes are designed for storing vast amounts of raw, unstructured, or semi-structured data at a low cost, and organizations share those datasets across multiple departments and teams.

Statistics

Statistics Data Lake Optimization Data-driven

Set up advanced rules to validate quality of multiple datasets with AWS Glue Data Quality

AWS Big Data

JUNE 6, 2023

Data is the lifeblood of modern businesses. In today’s data-driven world, companies rely on data to make informed decisions, gain a competitive edge, and provide exceptional customer experiences. However, not all data is created equal. AWS Glue Data Quality measures and monitors the quality of your dataset.

Data Quality

Data Quality Data Lake Visualization Data-driven

How Data Governance Protects Sensitive Data

erwin

APRIL 2, 2021

Organizations are managing more data than ever. With more companies increasingly migrating their data to the cloud to ensure availability and scalability, the risks associated with data management and protection also are growing. Data Security Starts with Data Governance. Who is authorized to use it and how?

Data Governance

Data Governance Cost-Benefit Risk Metadata

CDP Private Cloud is a Game-changer for Partners

Cloudera

SEPTEMBER 2, 2020

Additionally, lines of business (LOBs) are able to gain access to a shared data lake that is secured and governed by the use of Cloudera Shared Data Experience (SDX). According to 451 Research’s Voice of the Enterprise: Cloud, Hosting & Managed Services study, 58% of Enterprises are moving towards a hybrid IT environment.

Cost-Benefit

Cost-Benefit Data Warehouse Data Lake Machine Learning

Build a pseudonymization service on AWS to protect sensitive data: Part 2

AWS Big Data

MARCH 6, 2024

Part 1 of this two-part series described how to build a pseudonymization service that converts plain text data attributes into a pseudonym or vice versa. Consequently, an organization can achieve a standard process to handle sensitive data across all platforms.

Metrics

Metrics Statistics Testing Data Lake

Unlocking Data Storage: The Traditional Data Warehouse vs. Cloud Data Warehouse

Sisense

NOVEMBER 12, 2020

Data warehouse vs. databases Traditional vs. Cloud Explained Cloud data warehouses in your data stack A data-driven future powered by the cloud. We live in a world of data: There’s more of it than ever before, in a ceaselessly expanding array of forms and locations. And where does all this data live?

Data Warehouse

Data Warehouse Data Lake OLAP Data-driven

5 misconceptions about cloud data warehouses

IBM Big Data Hub

FEBRUARY 2, 2023

In today’s world, data warehouses are a critical component of any organization’s technology ecosystem. The rise of cloud has allowed data warehouses to provide new capabilities such as cost-effective data storage at petabyte scale, highly scalable compute and storage, pay-as-you-go pricing and fully managed service delivery.

Data Warehouse

Data Warehouse Cost-Benefit Unstructured Data Data Architecture

Announcing the 2021 Data Impact Awards

Cloudera

MAY 12, 2021

2020 saw us hosting our first ever fully digital Data Impact Awards ceremony, and it certainly was one of the highlights of our year. We saw a record number of entries and incredible examples of how customers were using Cloudera’s platform and services to unlock the power of data. DATA FOR ENTERPRISE AI.

Digital Transformation

Digital Transformation Machine Learning Optimization Data Lake

Quantitative and Qualitative Data: A Vital Combination

Sisense

OCTOBER 6, 2020

Digging into quantitative data Why is quantitative data important What are the problems with quantitative data Exploring qualitative data Qualitative data benefits Getting the most from qualitative data Better together. Almost every modern organization is now a data-generating machine. or “how often?”

Statistics

Statistics Unstructured Data Data-driven Visualization

Running both IT and digital at Alorica

CIO Business Intelligence

JUNE 1, 2022

Then, at the top of the pyramid, is full automation with AI-driven conversation capabilities. What data do you collect from those channels? Finally, make sure you understand your data, because no machine learning solution will work for you if you aren’t working with the right data. Data lakes have a new consumer in AI.

IT Interactive Marketing Consulting

Governing data in relational databases using Amazon DataZone

AWS Big Data

MAY 7, 2024

Data governance is a key enabler for teams adopting a data-driven culture and operational model to drive innovation with data. Amazon DataZone allows you to simply and securely govern end-to-end data assets stored in your Amazon Redshift data warehouses or data lakes cataloged with the AWS Glue data catalog.

Metadata

Metadata Data Lake Data Processing Data-driven

Introducing Amazon EMR on EKS job submission with Spark Operator and spark-submit

AWS Big Data

JUNE 6, 2023

Amazon EMR on EKS provides a deployment option for Amazon EMR that allows organizations to run open-source big data frameworks on Amazon Elastic Kubernetes Service (Amazon EKS). Additionally, you can use the Data on EKS blueprint to deploy the entire infrastructure using Terraform templates. impl: org.apache.hadoop.fs.s3.EMRFSDelegate

Optimization

Optimization Data Lake Cost-Benefit Management

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. In this post, we describe Orca’s journey building a transactional data lake using Amazon Simple Storage Service (Amazon S3), Apache Iceberg, and AWS Analytics.

Data Lake

Data Lake Analytics Snapshot Optimization

How The Cloud Made ‘Data-Driven Culture’ Possible | Part 1

BizAcuity

MAY 10, 2022

Cloud technology and innovation drives data-driven decision making culture in any organization. Cloud washing is storing data on the cloud for use over the internet. Storing data is extremely expensive even with VMs during this time. The platform is built on S3 and EC2 using a hosted Hadoop framework.

Data-driven

Data-driven IoT Unstructured Data Data Lake

Join a streaming data source with CDC data for real-time serverless data analytics using AWS Glue, AWS DMS, and Amazon DynamoDB

AWS Big Data

MAY 30, 2023

Customers have been using data warehousing solutions to perform their traditional analytics tasks. Recently, data lakes have gained lot of traction to become the foundation for analytical solutions, because they come with benefits such as scalability, fault tolerance, and support for structured, semi-structured, and unstructured datasets.

Data Lake

Data Lake Data Analytics Analytics Data Processing

Eight Top DataOps Trends for 2022

DataKitchen

NOVEMBER 29, 2021

DataOps adoption continues to expand as a perfect storm of social, economic, and technological factors drive enterprises to invest in process-driven innovation. Many in the data industry recognize the serious impact of AI bias and seek to take active steps to mitigate it. Data Gets Meshier. Companies Commit to Remote.

Testing

Testing Data Lake Data Architecture Manufacturing

AWS Glue crawlers support cross-account crawling to support data mesh architecture

AWS Big Data

MARCH 27, 2023

Data lakes have come a long way, and there’s been tremendous innovation in this space. Today’s modern data lakes are cloud native, work with multiple data types, and make this data easily available to diverse stakeholders across the business.

Data Lake

Data Lake Data-driven Management Data Architecture

BMC on BMC: How the company enables IT observability with BMC Helix and AIOps

CIO Business Intelligence

DECEMBER 7, 2023

As a global company with more than 6,000 employees, BMC faces many of the same data challenges that other large enterprises face. The organization has 500 applications for business services, 80,000 VMs, 3,000 hosts, and more than 100,000 containers. Given the sheer volume of enterprise data, it’s impossible to do this manually.

IT Data Lake Business Services Data Processing

Design a data mesh on AWS that reflects the envisioned organization

AWS Big Data

JANUARY 22, 2024

The company uses AWS Cloud services to build data-driven products and scale engineering best practices. To ensure a sustainable data platform amid growth and profitability phases, their tech teams adopted a decentralized data mesh architecture. The solution Acast implemented is a data mesh, architected on AWS.

Data-driven

Data-driven Advertising Metadata Data Architecture

Use your corporate identities for analytics with Amazon EMR and AWS IAM Identity Center

AWS Big Data

APRIL 26, 2024

To enable your workforce users for analytics with fine-grained data access controls and audit data access, you might have to create multiple AWS Identity and Access Management (IAM) roles with different data permissions and map the workforce users to one of those roles. We use Okta as the IdP for this demonstration.

Analytics

Analytics Data Lake Management Enterprise

Announcing the 2020 Data Impact Award Winners

Cloudera

NOVEMBER 18, 2020

During the first-ever virtual broadcast of our annual Data Impact Awards (DIA) ceremony, we had the great pleasure of announcing this year’s finalists and winners. In fact, each of the 29 finalists represented organizations running cutting-edge use cases that showcase a winning enterprise data cloud strategy. Data Champions .

Internet Publishing and Broadcasting

Internet Publishing and Broadcasting Data-driven Broadcasting Digital Transformation

Accomplish Agile Business Intelligence & Analytics For Your Business

datapine

APRIL 15, 2020

It’s necessary to say that these processes are recurrent and require continuous evolution of reports, online data visualization , dashboards, and new functionalities to adapt current processes and develop new ones. Discover the available data sources. ” “What do our users actually need?”. Determine BI funding and support.

Business Intelligence

Business Intelligence Analytics Testing Dashboards

Accelerating revenue growth with real-time analytics: Poshmark’s journey

AWS Big Data

MARCH 20, 2023

We discuss how to create such a solution using Amazon Kinesis Data Streams , Amazon Managed Streaming for Kafka (Amazon MSK), Amazon Kinesis Data Analytics for Apache Flink ; the design decisions that went into the architecture; and the observed business benefits by Poshmark.

Analytics

Analytics Slice and Dice Data Processing Data Lake

10 Keys to a Secure Cloud Data Lakehouse

Cloudera

OCTOBER 25, 2022

Enabling data and analytics in the cloud allows you to have infinite scale and unlimited possibilities to gain faster insights and make better decisions with data. Cloud data lakehouses provide significant scaling, agility, and cost advantages compared to cloud data lakes and cloud data warehouses.

Data Processing

Data Processing Data Lake Cost-Benefit Risk

Build efficient ETL pipelines with AWS Step Functions distributed map and redrive feature

AWS Big Data

DECEMBER 18, 2023

AWS Step Functions is a fully managed visual workflow service that enables you to build complex data processing pipelines involving a diverse set of extract, transform, and load (ETL) technologies such as AWS Glue , Amazon EMR , and Amazon Redshift. The data is then cleansed, transformed, and uploaded to Amazon S3 for further processing.

Metadata

Metadata Visualization Data Lake Data-driven

CIOs rise to the ESG reporting challenge

CIO Business Intelligence

JANUARY 30, 2024

The foundation for ESG reporting, of course, is data. What companies need more than anything is good data for ESG reporting. That means ensuring ESG data is available, transparent, and actionable, says Ivneet Kaur, EVP and chief information technology officer at identity services provider Sterling.

Reporting

Reporting Data Quality Strategy Data-driven

Top 15 data management platforms available today

CIO Business Intelligence

SEPTEMBER 22, 2023

Data management platform definition A data management platform (DMP) is a suite of tools that helps organizations to collect and manage data from a wide array of first-, second-, and third-party sources and to create reports and build customer profiles as part of targeted personalization campaigns.

Management

Management Advertising Data Lake Sales

Top 15 data management platforms

CIO Business Intelligence

JUNE 9, 2022

A data management platform (DMP) is a group of tools designed to help organizations collect and manage data from a wide array of sources and to create reports that help explain what is happening in those data streams. Deploying a DMP can be a great way for companies to navigate a business world dominated by data.

Management

Management Advertising Data Lake Sales

Create an end-to-end data strategy for Customer 360 on AWS

AWS Big Data

MARCH 26, 2024

This view is used to identify patterns and trends in customer behavior, which can inform data-driven decisions to improve business outcomes. In this post, we discuss how you can use purpose-built AWS services to create an end-to-end data strategy for C360 to unify and govern customer data that address these challenges.

Data Strategy

Data Strategy Strategy Data Warehouse Prescriptive Analytics

How Novo Nordisk built distributed data governance and control at scale

AWS Big Data

APRIL 28, 2023

This is the second post of a three-part series detailing how Novo Nordisk , a large pharmaceutical enterprise, partnered with AWS Professional Services to build a scalable and secure data and analytics platform. The third post will show how end-users can consume data from their tool of choice, without compromising data governance.

Data Governance

Data Governance Management Data-driven Data Lake

Periscope Data Expands to Israel, Empowering Data Teams with Powerful Tools

Sisense

DECEMBER 11, 2019

With data growing at a staggering rate, managing and structuring it is vital to your survival. In this piece, we detail the Israeli debut of Periscope Data. Driving startup growth with the power of data. Driving startup growth with the power of data. The rise of the data team: from startup to unicorn.

Data Lake

Data Lake Big Data Sales Data-driven

Make Better Data-Driven Decisions with DataRobot AI Platform Single-Tenant SaaS on Microsoft Azure

DataRobot Blog

MARCH 7, 2023

DataRobot on Azure accelerates the machine learning lifecycle with advanced capabilities for rapid experimentation across new data sources and multiple problem types. This generates reliable business insights and sustains AI-driven value across the enterprise.

Data-driven

Data-driven Machine Learning Experimentation Data Lake

Real-time streaming data top picks you cannot miss at AWS re:Invent 2023

AWS Big Data

NOVEMBER 8, 2023

Join us as we delve into the world of real-time streaming data at re:Invent 2023 and discover how you can use real-time streaming data to build new use cases, optimize existing projects and processes, and reimagine what’s possible. High-quality data is not just about accuracy; it’s also about timeliness. Register now!

Data-driven

Data-driven Data Lake Machine Learning Cost-Benefit

Empowering data-driven excellence: How the Bluestone Data Platform embraced data mesh for success

AWS Big Data

FEBRUARY 27, 2024

In the ever-evolving world of finance and lending, the need for real-time, reliable, and centralized data has become paramount. Bluestone , a leading financial institution, embarked on a transformative journey to modernize its data infrastructure and transition to a data-driven organization.

Data-driven

Data-driven Data Lake Data Quality Data Governance

Build a data lake with Apache Flink on Amazon EMR

AWS Big Data

JANUARY 27, 2023

To build a data-driven business, it is important to democratize enterprise data assets in a data catalog. With a unified data catalog, you can quickly search datasets and figure out data schema, data format, and location. GenericInMemoryCatalog stores the catalog data in memory.

Data Lake

Data Lake Metadata Business Analysis Data-driven

PODCAST: Making AI Real – Episode 4: Unlocking the Value of Enterprise AI with Data Engineering Capabilities

bridgei2i

MARCH 3, 2021

Episode 4: Unlocking the Value of Enterprise AI with Data Engineering Capabilities. Unlocking the Value of Enterprise AI with Data Engineering Capabilities. They discuss how the data engineering team is instrumental in easing collaboration between analysts, data scientists and ML engineers to build enterprise AI solutions.

Enterprise

Enterprise Digital Transformation Data-driven Interactive

How smava makes loans transparent and affordable using Amazon Redshift Serverless

AWS Big Data

DECEMBER 21, 2023

This is a guest post co-written by Alex Naumov, Principal Data Architect at smava. smava believes in and takes advantage of data-driven decisions in order to become the market leader. smava believes in and takes advantage of data-driven decisions in order to become the market leader.

Data Lake

Data Lake Data Warehouse Data-driven B2B

The disruptive potential of open data lakehouse architectures and IBM watsonx.data

IBM Big Data Hub

JUNE 15, 2023

There’s no debate that the volume and variety of data is exploding and that the associated costs are rising rapidly. The proliferation of data silos also inhibits the unification and enrichment of data which is essential to unlocking the new insights. Enter the open data lakehouse.

Data Warehouse

Data Warehouse Data Lake Optimization Data-driven

How Amazon Finance Automation built a data mesh to support distributed data ownership and centralize governance

AWS Big Data

JULY 14, 2023

FinAuto has a unique position to look across FinOps and provide solutions that help satisfy multiple use cases with accurate, consistent, and governed delivery of data and related services. These datasets can then be used to power front end systems, ML pipelines, and data engineering teams.

Finance

Finance Metadata Big Data Recreation/Entertainment

The essential check list for effective data democratization

CIO Business Intelligence

JANUARY 20, 2023

Truly data-driven companies see significantly better business outcomes than those that aren’t. But to get maximum value out of data and analytics, companies need to have a data-driven culture permeating the entire organization, one in which every business unit gets full access to the data it needs in the way it needs it.

Data Lake

Data Lake Data-driven Finance Data Architecture

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

DS Smith sets a single-cloud agenda for sustainability

Webinars

Trending Sources

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

Webinars

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics, Part 3: Visualization and trend analysis using Amazon QuickSight

Enhance query performance using AWS Glue Data Catalog column-level statistics

Set up advanced rules to validate quality of multiple datasets with AWS Glue Data Quality

How Data Governance Protects Sensitive Data

CDP Private Cloud is a Game-changer for Partners

Build a pseudonymization service on AWS to protect sensitive data: Part 2

Unlocking Data Storage: The Traditional Data Warehouse vs. Cloud Data Warehouse

5 misconceptions about cloud data warehouses

Announcing the 2021 Data Impact Awards

Quantitative and Qualitative Data: A Vital Combination

Running both IT and digital at Alorica

Governing data in relational databases using Amazon DataZone

Introducing Amazon EMR on EKS job submission with Spark Operator and spark-submit

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

How The Cloud Made ‘Data-Driven Culture’ Possible | Part 1

Join a streaming data source with CDC data for real-time serverless data analytics using AWS Glue, AWS DMS, and Amazon DynamoDB

Eight Top DataOps Trends for 2022

AWS Glue crawlers support cross-account crawling to support data mesh architecture

BMC on BMC: How the company enables IT observability with BMC Helix and AIOps

Design a data mesh on AWS that reflects the envisioned organization

Use your corporate identities for analytics with Amazon EMR and AWS IAM Identity Center

Announcing the 2020 Data Impact Award Winners

Accomplish Agile Business Intelligence & Analytics For Your Business

Accelerating revenue growth with real-time analytics: Poshmark’s journey

10 Keys to a Secure Cloud Data Lakehouse

Build efficient ETL pipelines with AWS Step Functions distributed map and redrive feature

CIOs rise to the ESG reporting challenge

Top 15 data management platforms available today

Top 15 data management platforms

Create an end-to-end data strategy for Customer 360 on AWS

How Novo Nordisk built distributed data governance and control at scale

Periscope Data Expands to Israel, Empowering Data Teams with Powerful Tools

Make Better Data-Driven Decisions with DataRobot AI Platform Single-Tenant SaaS on Microsoft Azure

Real-time streaming data top picks you cannot miss at AWS re:Invent 2023

Empowering data-driven excellence: How the Bluestone Data Platform embraced data mesh for success

Build a data lake with Apache Flink on Amazon EMR

PODCAST: Making AI Real – Episode 4: Unlocking the Value of Enterprise AI with Data Engineering Capabilities

How smava makes loans transparent and affordable using Amazon Redshift Serverless

The disruptive potential of open data lakehouse architectures and IBM watsonx.data

How Amazon Finance Automation built a data mesh to support distributed data ownership and centralize governance

The essential check list for effective data democratization

Stay Connected