Big Data, Data Lake, Data Processing and Machine Learning

Big Data

Data Lake

Data Processing

Machine Learning

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

AWS Big Data

MARCH 10, 2023

Since the deluge of big data over a decade ago, many organizations have learned to build applications to process and analyze petabytes of data. Data lakes have served as a central repository to store structured and unstructured data at any scale and in various formats.

Data Lake

Data Lake Sales Data Warehouse Snapshot

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

AWS Big Data

MARCH 7, 2024

At the same time, they need to optimize operational costs to unlock the value of this data for timely insights and do so with a consistent performance. With this massive data growth, data proliferation across your data stores, data warehouse, and data lakes can become equally challenging.

Data Lake

Data Lake Analytics Dashboards Metrics

Join 52,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

How to Optimize the Developer Experience for Monumental Impact

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

MORE WEBINARS

Trending Sources

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

AWS Big Data

DECEMBER 13, 2023

For the past 5 years, BMS has used a custom framework called Enterprise Data Lake Services (EDLS) to create ETL jobs for business users. BMS’s EDLS platform hosts over 5,000 jobs and is growing at 15% YoY (year over year). About the authors Sivaprasad Mahamkali is a Senior Streaming Data Engineer at AWS Professional Services.

Metadata

Metadata Data Lake Visualization Data Transformation

Webinars

How to Optimize the Developer Experience for Monumental Impact

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

MORE WEBINARS

10 Things AWS Can Do for Your SaaS Company

Smart Data Collective

FEBRUARY 20, 2022

Data storage databases. Your SaaS company can store and protect any amount of data using Amazon Simple Storage Service (S3), which is ideal for data lakes, cloud-native applications, and mobile apps. AWS also offers developers the technology to develop smart apps using machine learning and complex algorithms.

Cost-Benefit

Cost-Benefit Data Lake Software Machine Learning

Access Amazon Athena in your applications using the WebSocket API

AWS Big Data

MARCH 2, 2023

Many organizations are building data lakes to store and analyze large volumes of structured, semi-structured, and unstructured data. In addition, many teams are moving towards a data mesh architecture, which requires them to expose their data sets as easily consumable data products.

Data Lake

Data Lake Testing Interactive Unstructured Data

Power enterprise-grade Data Vaults with Amazon Redshift – Part 2

AWS Big Data

NOVEMBER 16, 2023

Amazon Redshift is a popular cloud data warehouse, offering a fully managed cloud-based service that seamlessly integrates with an organization’s Amazon Simple Storage Service (Amazon S3) data lake, real-time streams, machine learning (ML) workflows, transactional workflows, and much more—all while providing up to 7.9x

Enterprise

Enterprise Data Warehouse Snapshot Cost-Benefit

Themes and Conferences per Pacoid, Episode 8

Domino Data Lab

APRIL 3, 2019

It includes perspectives about current issues, themes, vendors, and products for data governance. My interest in data governance (DG) began with the recent industry surveys by O’Reilly Media about enterprise adoption of “ABC” (AI, Big Data, Cloud). Instead, they refactored their monolithic web apps (e.g.,

Data Governance

Data Governance Machine Learning Metadata Big Data

5 misconceptions about cloud data warehouses

IBM Big Data Hub

FEBRUARY 2, 2023

In today’s world, data warehouses are a critical component of any organization’s technology ecosystem. They provide the backbone for a range of use cases such as business intelligence (BI) reporting, dashboarding, and machine-learning (ML)-based predictive analytics, that enable faster decision making and insights.

Data Warehouse

Data Warehouse Cost-Benefit Unstructured Data Data Architecture

Quantitative and Qualitative Data: A Vital Combination

Sisense

OCTOBER 6, 2020

Despite its many uses, quantitative data presents two main challenges for a data-driven organization. First, data isn’t created in a uniform, consistent format. It’s generated by a host of sources in different ways. Qualitative data benefits: Unlocking understanding.

Statistics

Statistics Unstructured Data Data-driven Visualization

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. In this post, we describe Orca’s journey building a transactional data lake using Amazon Simple Storage Service (Amazon S3), Apache Iceberg, and AWS Analytics.

Data Lake

Data Lake Analytics Snapshot Optimization

Governing data in relational databases using Amazon DataZone

AWS Big Data

MAY 7, 2024

It also makes it easier for engineers, data scientists, product managers, analysts, and business users to access data throughout an organization to discover, use, and collaborate to derive data-driven insights. Note that a managed data asset is an asset for which Amazon DataZone can manage permissions.

Metadata

Metadata Data Lake Data Processing Data-driven

How Cargotec uses metadata replication to enable cross-account data sharing

AWS Big Data

JUNE 7, 2023

Cargotec captures terabytes of IoT telemetry data from their machinery operated by numerous customers across the globe. This data needs to be ingested into a data lake, transformed, and made available for analytics, machine learning (ML), and visualization. The job runs in the target account.

Metadata

Metadata Data Lake Machine Learning Big Data

Announcing the 2020 Data Impact Award Winners

Cloudera

NOVEMBER 18, 2020

It hosts over 150 big data analytics sandboxes across the region with over 200 users utilizing the sandbox for data discovery. With this functionality, business units can now leverage big data analytics to develop better and faster insights to help achieve better revenues, higher productivity, and decrease risk. .

Internet Publishing and Broadcasting

Internet Publishing and Broadcasting Data-driven Broadcasting Digital Transformation

How Gilead used Amazon Redshift to quickly and cost-effectively load third-party medical claims data

AWS Big Data

NOVEMBER 8, 2023

Because Gilead is expanding into biologics and large molecule therapies, and has an ambitious goal of launching 10 innovative therapies by 2030, there is heavy emphasis on using data with AI and machine learning (ML) to accelerate the drug discovery pipeline. Create a data lake external schema and table in Redshift Serverless.

Data Lake

Data Lake Data Warehouse Cost-Benefit Optimization

AWS Glue crawlers support cross-account crawling to support data mesh architecture

AWS Big Data

MARCH 27, 2023

Data lakes have come a long way, and there’s been tremendous innovation in this space. Today’s modern data lakes are cloud native, work with multiple data types, and make this data easily available to diverse stakeholders across the business. In the navigation pane, under Data catalog , choose Settings.

Data Lake

Data Lake Data-driven Management Data Architecture

Accelerating revenue growth with real-time analytics: Poshmark’s journey

AWS Big Data

MARCH 20, 2023

Other use cases – Additional use cases relating to aggregations and machine learning (ML) inference use cases such as authorization to operate, listing spam detection, and avoiding account takeovers (ATOs), among others. The data from the S3 data lake is used for batch processing and analytics through Amazon EMR and Amazon Redshift.

Analytics

Analytics Slice and Dice Data Processing Data Lake

Use your corporate identities for analytics with Amazon EMR and AWS IAM Identity Center

AWS Big Data

APRIL 26, 2024

With trusted identity propagation, data access management can be based on a user’s corporate identity and can be propagated seamlessly as they access data with single sign-on to build analytics applications with Amazon EMR (EMR Studio and Amazon EMR on EC2). Select Named Data Catalog resources. Choose Grant.

Analytics

Analytics Data Lake Management Enterprise

How The Cloud Made ‘Data-Driven Culture’ Possible | Part 1

BizAcuity

MAY 10, 2022

2007: Amazon launches SimpleDB, a non-relational (NoSQL) database that allows businesses to cheaply process vast amounts of data with minimal effort. The platform is built on S3 and EC2 using a hosted Hadoop framework. An efficient big data management and storage solution that AWS quickly took advantage of.

Data-driven

Data-driven IoT Unstructured Data Data Lake

Run Spark SQL on Amazon Athena Spark

AWS Big Data

OCTOBER 23, 2023

Modern applications store massive amounts of data on Amazon Simple Storage Service (Amazon S3) data lakes, providing cost-effective and highly durable storage, and allowing you to run analytics and machine learning (ML) from your data lake to generate insights on your data.

Data Lake

Data Lake Visualization Optimization Interactive

Create an end-to-end data strategy for Customer 360 on AWS

AWS Big Data

MARCH 26, 2024

This can be achieved using AWS Entity Resolution , which enables using rules and machine learning (ML) techniques to match records and resolve identities. The AWS modern data architecture shows a way to build a purpose-built, secure, and scalable data platform in the cloud.

Data Strategy

Data Strategy Strategy Data Warehouse Prescriptive Analytics

Dancing with Elephants in 5 Easy Steps

Cloudera

AUGUST 21, 2020

And next to those legacy ERP, HCM, SCM and CRM systems, that mysterious elephant in the room – that “Big Data” platform running in the data center that is driving much of the company’s analytics and BI – looks like a great potential candidate. . Streaming data analytics. . Data science & engineering.

Cost-Benefit

Cost-Benefit Big Data ROI Risk

Modern Data Architecture for Telecommunications

Cloudera

SEPTEMBER 6, 2022

Previously, there were three types of data structures in telco: . Entity data sets — i.e. marketing data lakes . It is an edge-to-AI suite of capabilities, including edge analytics, data staging, data quality control, data visualization tools, and machine learning.

Data Architecture

Data Architecture Cost-Benefit Digital Transformation Business Driver

Exploring the AI and data capabilities of watsonx

IBM Big Data Hub

JULY 17, 2023

is our enterprise-ready next-generation studio for AI builders, bringing together traditional machine learning (ML) and new generative AI capabilities powered by foundation models. Automated development: Automates data preparation, model development, feature engineering and hyperparameter optimization using AutoAI.

Machine Learning

Machine Learning Data Warehouse Modeling Cost-Benefit

Federate Amazon QuickSight access with open-source identity provider Keycloak

AWS Big Data

JUNE 13, 2023

Amazon QuickSight is a scalable, serverless, embeddable, machine learning (ML) powered business intelligence (BI) service built for the cloud that supports identity federation in both Standard and Enterprise editions. Vamsi Bhadriraju is a Data Architect at AWS. Change the IdP initiated SSO Relay State to [link].

Metadata

Metadata Dashboards Business Intelligence Management

Analyze Amazon S3 storage costs using AWS Cost and Usage Reports, Amazon S3 Inventory, and Amazon Athena

AWS Big Data

FEBRUARY 2, 2023

Since its launch in 2006, Amazon Simple Storage Service (Amazon S3) has experienced major growth, supporting multiple use cases such as hosting websites, creating data lakes, serving as object storage for consumer applications, storing logs, and archiving data. This could be your data lake or application S3 bucket.

Reporting

Reporting Data Lake Management Optimization

Stitch Fix seamless migration: Transitioning from self-managed Kafka to Amazon MSK

AWS Big Data

SEPTEMBER 22, 2023

At Stitch Fix, we have been powered by data science since its foundation and rely on many modern data lake and data processing technologies. In our infrastructure, Apache Kafka has emerged as a powerful tool for managing event streams and facilitating real-time data processing.

Management

Management Metrics Cost-Benefit Data Lake

Periscope Data Expands to Israel, Empowering Data Teams with Powerful Tools

Sisense

DECEMBER 11, 2019

The challenge is to do it right, and a crucial way to achieve it is with decisions based on data and analysis that drive measurable business results. This was the key learning from the Sisense event heralding the launch of Periscope Data in Tel Aviv, Israel — the beating heart of the startup nation. What VCs want from startups.

Data Lake

Data Lake Big Data Sales Data-driven

How Novo Nordisk built distributed data governance and control at scale

AWS Big Data

APRIL 28, 2023

In this example, the analytics tool accesses the data lake on Amazon Simple Storage Service (Amazon S3) through Athena queries. As the data mesh pattern expands across domains covering more downstream services, we need a mechanism to keep IdPs and IAM role trusts continuously updated.

Data Governance

Data Governance Management Data-driven Data Lake

Real-time streaming data top picks you cannot miss at AWS re:Invent 2023

AWS Big Data

NOVEMBER 8, 2023

In today’s data-driven landscape, the quality of data is the foundation upon which the success of organizations and innovations stands. High-quality data is not just about accuracy; it’s also about timeliness. Reserve your seat now! Reserve your seat now! Reserve your seat now! Reserve your seat now!

Data-driven

Data-driven Data Lake Machine Learning Cost-Benefit

Empowering data-driven excellence: How the Bluestone Data Platform embraced data mesh for success

AWS Big Data

FEBRUARY 27, 2024

Each data producer within the organization has its own data lake in Apache Hudi format, ensuring data sovereignty and autonomy. This enables data-driven decision-making across the organization. Her special areas of interest are data analytics, machine learning/AI, and application modernization.

Data-driven

Data-driven Data Lake Data Quality Data Governance

How Amazon Finance Automation built a data mesh to support distributed data ownership and centralize governance

AWS Big Data

JULY 14, 2023

In this post, we discuss how the Amazon Finance Automation team used AWS Lake Formation and the AWS Glue Data Catalog to build a data mesh architecture that simplified data governance at scale and provided seamless data access for analytics, AI, and machine learning (ML) use cases.

Finance

Finance Metadata Big Data Recreation/Entertainment

The disruptive potential of open data lakehouse architectures and IBM watsonx.data

IBM Big Data Hub

JUNE 15, 2023

It is comprised of commodity cloud object storage, open data and open table formats, and high-performance open-source query engines. To help organizations scale AI workloads, we recently announced IBM watsonx.data , a data store built on an open data lakehouse architecture and part of the watsonx AI and data platform.

Data Warehouse

Data Warehouse Data Lake Optimization Data-driven

Addressing the Three Scalability Challenges in Modern Data Platforms

Cloudera

NOVEMBER 22, 2021

In addition, data pipelines include more and more stages, thus making it difficult for data engineers to compile, manage, and troubleshoot those analytical workloads. In addition, data pipelines include more and more stages, thus making it difficult for data engineers to compile, manage, and troubleshoot those analytical workloads.

Data Processing

Data Processing Data Warehouse Enterprise Visualization

The Gartner 2021 Leadership Vision for Data & Analytics Leaders Webinar Q&A

Andrew White

JANUARY 11, 2021

On January 4th I had the pleasure of hosting a webinar. It was titled, The Gartner 2021 Leadership Vision for Data & Analytics Leaders. This was for the Chief Data Officer, or head of data and analytics. Many data science labs are set up as shared services. Data lakes don’t offer this nor should they.

Data Analytics

Data Analytics Analytics Data-driven Finance

Data Leaders Brief

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

Webinars

Trending Sources

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

Webinars

10 Things AWS Can Do for Your SaaS Company

Access Amazon Athena in your applications using the WebSocket API

Power enterprise-grade Data Vaults with Amazon Redshift – Part 2

Themes and Conferences per Pacoid, Episode 8

5 misconceptions about cloud data warehouses

Quantitative and Qualitative Data: A Vital Combination

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Governing data in relational databases using Amazon DataZone

How Cargotec uses metadata replication to enable cross-account data sharing

Announcing the 2020 Data Impact Award Winners

How Gilead used Amazon Redshift to quickly and cost-effectively load third-party medical claims data

AWS Glue crawlers support cross-account crawling to support data mesh architecture

Accelerating revenue growth with real-time analytics: Poshmark’s journey

Use your corporate identities for analytics with Amazon EMR and AWS IAM Identity Center

How The Cloud Made ‘Data-Driven Culture’ Possible | Part 1

Run Spark SQL on Amazon Athena Spark

Create an end-to-end data strategy for Customer 360 on AWS

Dancing with Elephants in 5 Easy Steps

Modern Data Architecture for Telecommunications

Exploring the AI and data capabilities of watsonx

Federate Amazon QuickSight access with open-source identity provider Keycloak

Analyze Amazon S3 storage costs using AWS Cost and Usage Reports, Amazon S3 Inventory, and Amazon Athena

Stitch Fix seamless migration: Transitioning from self-managed Kafka to Amazon MSK

Periscope Data Expands to Israel, Empowering Data Teams with Powerful Tools

How Novo Nordisk built distributed data governance and control at scale

Real-time streaming data top picks you cannot miss at AWS re:Invent 2023

Empowering data-driven excellence: How the Bluestone Data Platform embraced data mesh for success

How Amazon Finance Automation built a data mesh to support distributed data ownership and centralize governance

The disruptive potential of open data lakehouse architectures and IBM watsonx.data

Addressing the Three Scalability Challenges in Modern Data Platforms

The Gartner 2021 Leadership Vision for Data & Analytics Leaders Webinar Q&A

Stay Connected