Data Lake, Interactive and Publishing

Data Lake

Interactive

Publishing

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

AWS Big Data

SEPTEMBER 13, 2023

A modern data architecture is an evolutionary architecture pattern designed to integrate a data lake, data warehouse, and purpose-built stores with a unified governance model. The company wanted the ability to continue processing operational data in the secondary Region in the rare event of primary Region failure.

Data Lake

Data Lake Data Processing Metadata Snapshot

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

AWS Big Data

MARCH 10, 2023

Since the deluge of big data over a decade ago, many organizations have learned to build applications to process and analyze petabytes of data. Data lakes have served as a central repository to store structured and unstructured data at any scale and in various formats.

Data Lake

Data Lake Sales Data Warehouse Snapshot

Join 52,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

How To Get Promoted In Product Management

MORE WEBINARS

Trending Sources

Analytics Vidhya

Enable business users to analyze large datasets in your data lake with Amazon QuickSight

AWS Big Data

JUNE 23, 2023

Events and many other security data types are stored in Imperva’s Threat Research Multi-Region data lake. Imperva harnesses data to improve their business outcomes. As part of their solution, they are using Amazon QuickSight to unlock insights from their data.

Data Lake

Data Lake Cost-Benefit Dashboards Data Warehouse

Webinars

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

How To Get Promoted In Product Management

MORE WEBINARS

Create an Apache Hudi-based near-real-time transactional data lake using AWS DMS, Amazon Kinesis, AWS Glue streaming ETL, and data visualization using Amazon QuickSight

AWS Big Data

AUGUST 3, 2023

Data analytics on operational data at near-real time is becoming a common need. Due to the exponential growth of data volume, it has become common practice to replace read replicas with data lakes to have better scalability and performance. For more information, see Changing the default settings for your data lake.

Data Lake

Data Lake Visualization Dashboards Insurance

Use the Amazon Redshift Data API to interact with Amazon Redshift Serverless

AWS Big Data

APRIL 28, 2023

As a data engineer or application developer, for some use cases, you want to interact with the Redshift Serverless data warehouse to load or query data with a simple API endpoint without having to manage persistent connections. You can unload data in either text or Parquet format.

Interactive

Interactive Metadata Data Warehouse Data-driven

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

AWS Big Data

NOVEMBER 29, 2023

We have seen a strong customer demand to expand its scope to cloud-based data lakes because data lakes are increasingly the enterprise solution for large-scale data initiatives due to their power and capabilities. Let’s say that this company is located in Europe and the data product must comply with the GDPR.

Data Lake

Data Lake Management Metrics Data Warehouse

Interact with Apache Iceberg tables using Amazon Athena and cross account fine-grained permissions using AWS Lake Formation

AWS Big Data

MARCH 23, 2023

It makes sharing data across LoBs non-trivial. These organizations have adopted a federated model, with each LoB having the autonomy to make decisions on their data. They use the publisher/consumer model with a centralized governance layer that is used to enforce access controls. He is an Apache Iceberg Committer and PMC member.

Interactive

Interactive Snapshot Data Lake Software

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics, Part 3: Visualization and trend analysis using Amazon QuickSight

AWS Big Data

MARCH 29, 2024

QuickSight makes it straightforward for business users to visualize data in interactive dashboards and reports. venvScriptsactivate.bat After this step, the subsequent steps run within the bounds of the virtual environment on the client machine and interact with the AWS account as needed. Choose Publish dashboard.

Metrics

Metrics Visualization Dashboards Interactive

AWS Lake Formation 2022 year in review

AWS Big Data

JANUARY 31, 2023

We have collected some of the key talks and solutions on data governance, data mesh, and modern data architecture published and presented in AWS re:Invent 2022, and a few data lake solutions built by customers and AWS Partners for easy reference. Starting with Amazon EMR release 6.7.0,

Data Lake

Data Lake Data Governance Data Architecture Machine Learning

Access Amazon Athena in your applications using the WebSocket API

AWS Big Data

MARCH 2, 2023

Many organizations are building data lakes to store and analyze large volumes of structured, semi-structured, and unstructured data. In addition, many teams are moving towards a data mesh architecture, which requires them to expose their data sets as easily consumable data products.

Data Lake

Data Lake Testing Interactive Unstructured Data

Build an end-to-end serverless streaming pipeline with Apache Kafka on Amazon MSK using Python

AWS Big Data

MARCH 21, 2024

For your application’s low-latency and real-time data access, you can use Lambda and DynamoDB. For longer-term data storage, you can use managed serverless connector service Amazon Data Firehose to send data to your data lake. You can use the same data to train ML models.

Data Lake

Data Lake Management Modeling Optimization

What is a Data Pipeline?

Jet Global

MAY 9, 2024

The key components of a data pipeline are typically: Data Sources : The origin of the data, such as a relational database , data warehouse, data lake , file, API, or other data store. This can include tasks such as data ingestion, cleansing, filtering, aggregation, or standardization.

Data Lake

Data Lake Data Warehouse Business Intelligence Machine Learning

Data platform trinity: Competitive or complementary?

IBM Big Data Hub

JANUARY 18, 2023

In another decade, the internet and mobile started the generate data of unforeseen volume, variety and velocity. It required a different data platform solution. Hence, Data Lake emerged, which handles unstructured and structured data with huge volume. A data fabric is comprised of a network of data nodes (e.g.,

Data Lake

Data Lake Data Warehouse Data-driven Metadata

How gaming companies can use Amazon Redshift Serverless to build scalable analytical applications faster and easier

AWS Big Data

MARCH 7, 2023

A data hub contains data at multiple levels of granularity and is often not integrated. It differs from a data lake by offering data that is pre-validated and standardized, allowing for simpler consumption by users. Data hubs and data lakes can coexist in an organization, complementing each other.

Analytics

Analytics Data Warehouse Data Lake Metadata

Estimating Scope 1 Carbon Footprint with Amazon Athena

AWS Big Data

AUGUST 2, 2023

In this blog, we will walk through how we can apply existing enterprise data to better understand and estimate Scope 1 carbon footprint using Amazon Simple Storage Service (S3) and Amazon Athena , a serverless interactive analytics service that makes it easy to analyze data using standard SQL.

Data Lake

Data Lake Measurement Visualization Data Architecture

Automate large-scale data validation using Amazon EMR and Apache Griffin

AWS Big Data

APRIL 4, 2024

In the depicted architecture and our typical data lake use case, our data either resides n Amazon S3 or is migrated from on premises to Amazon S3 using replication tools such as AWS DataSync or AWS Database Migration Service (AWS DMS). It generates and publishes reports in Amazon S3, which are then accessible via Athena.

Data Quality

Data Quality Data Lake Data Warehouse Data-driven

Nexthink scales to trillions of events per day with Amazon MSK

AWS Big Data

MARCH 29, 2024

Our services consuming this data inherit the same resilience from Amazon MSK. If our backend ingestion services face disruptions, no event is lost, because Kafka retains all published messages. Teams could efficiently consume events published by others, offering new capabilities more rapidly while reducing cross-team dependencies.

Cost-Benefit

Cost-Benefit Data-driven Metrics Management

Data Mesh 101: What it is and Why You Should Care

Ontotext

FEBRUARY 12, 2024

It proposes a technological, architectural, and organizational approach to solving data management problems by breaking up the monolithic data platform and de-centralizing data management across different domain teams and services. Once these domains interact and share data with each other, the mesh emerges.

IT Metadata Data Quality Data Lake

How The CIO Can Become The CMO’s Best Ally In The Use Of Data

CIO Business Intelligence

SEPTEMBER 21, 2022

“The good news for many CIOs is that they’ve already laid the groundwork through investments in data governance and migration to the cloud,” LiveRamp noted in a recent report. Inconsistent data , which can result in inaccuracies in interacting with customers, and affect the internal operational use of data.

Risk

Risk Data Lake Marketing Interactive

The hidden history of Db2

IBM Big Data Hub

JULY 5, 2022

Back in the 1960s and 70s, vast amounts of data were stored in the world’s new mainframe computers—many of them IBM System/360 machines—and had become a problem. Finally, 13 years after Codd published his paper, IBM Db2 on z/OS was born, and 10 years after that the first IBM Db2 database for LUW was released. . They were expensive.

Data Lake

Data Lake Data Warehouse Publishing Structured Data

The Enduring Significance of Data Modeling in the Modern Data-Driven Enterprise

erwin

AUGUST 31, 2023

1960s – Pre-Relational Era: IBM develops the Information Management System (IMS), a hierarchical database management system (DBMS), which organized data in a tree-like structure. Codd, a computer scientist at IBM, publishes a groundbreaking paper titled “A Relational Model of Data for Large Shared Data Banks.”

Data-driven

Data-driven Modeling Enterprise Structured Data

Introducing watsonx: The future of AI for business

IBM Big Data Hub

MAY 9, 2023

As AI becomes more pervasive, businesses need to feel confident that their models can be relied upon not to “hallucinate” facts or use inappropriate language when interacting with customers. A data store built on open lakehouse architecture, it runs both on premises and across multi-cloud environments.

Data Warehouse

Data Warehouse Machine Learning Cost-Benefit Metadata

What is Data Mesh?

Ontotext

NOVEMBER 16, 2023

In a data mesh, domains are represented by a node, which can be an operational data store (ODS), a data warehouse, or a data lake tailored to the domain’s requirements. Mesh emerges when teams use other domains’ data products and the domains communicate with others in a governed manner.

Metadata

Metadata Data-driven Data Quality Data Architecture

Accelerate Amazon Redshift secure data use with Satori – Part 1

AWS Big Data

SEPTEMBER 21, 2023

Satori accelerates implementing data security controls on datawarehouses like Amazon Redshift, is straightforward to integrate, and doesn’t require any changes to your Amazon Redshift data, schema, or how your users interact with data. Satori interacts with identity providers either via API or by using the SAML protocol.

Data Warehouse

Data Warehouse Interactive Data Architecture Data Lake

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Cloudera

OCTOBER 7, 2021

Those decentralization efforts appeared under different monikers through time, e.g., data marts versus data warehousing implementations (a popular architectural debate in the era of structured data) then enterprise-wide data lakes versus smaller, typically BU-Specific, “data ponds”.

Metadata

Metadata Cost-Benefit Enterprise Interactive

Data Visualization and Visual Analytics: Seeing the World of Data

Sisense

JUNE 30, 2020

Data visualization can either be static or interactive. Interactive visualizations enable users to drill down into data and extract and examine various views of the same dataset, selecting specific data points that they want to see in a visualized format. The role of visualizations in analytics.

Visualization

Visualization Analytics Dashboards Data-driven

Cloudera Data Warehouse Demonstrates Best-in-Class Cloud-Native Price-Performance

Cloudera

JANUARY 15, 2021

Cloudera Data Warehouse is a highly scalable service that marries the SQL engine technologies of Apache Impala and Apache Hive with cloud-native features to deliver best-in-class price-performance for users running data warehousing workloads in the cloud. But don’t just take our word for it. Benchmark Description.

Data Warehouse

Data Warehouse Cost-Benefit Consulting Interactive

Celebrating Data Superheroes: The 2021 Data Impact Awards Winners

Cloudera

NOVEMBER 18, 2021

So, without further ado, it is with great delight that we officially publish the 2021 Data Impact Award winners! Data Lifecycle Connection. This allows for an omni-channel view of the customer and enables real-time data streaming and a safe zone to test machine learning models using Cloudera Data Science Workbench (CDSW).

Data Lake

Data Lake Cost-Benefit Digital Transformation Risk

Analyze Amazon S3 storage costs using AWS Cost and Usage Reports, Amazon S3 Inventory, and Amazon Athena

AWS Big Data

FEBRUARY 2, 2023

Since its launch in 2006, Amazon Simple Storage Service (Amazon S3) has experienced major growth, supporting multiple use cases such as hosting websites, creating data lakes, serving as object storage for consumer applications, storing logs, and archiving data. This could be your data lake or application S3 bucket.

Reporting

Reporting Data Lake Management Optimization

Migrate from Amazon Kinesis Data Analytics for SQL Applications to Amazon Kinesis Data Analytics Studio

AWS Big Data

JUNE 29, 2023

Solution overview For our use case, we use several AWS services to stream, ingest, transform, and analyze sample automotive sensor data in real time using Kinesis Data Analytics Studio. Kinesis Data Analytics Studio allows us to create a notebook, which is a web-based development environment. Provide the following SQL statement.

Data Analytics

Data Analytics Analytics IoT Data Lake

Periscope Data Expands to Israel, Empowering Data Teams with Powerful Tools

Sisense

DECEMBER 11, 2019

And he demonstrated how the Periscope Data platform overcomes the challenges of huge data volumes that can’t be easily modeled by traditional BI. Citing Tinder as a major example, Kyle explained how it constantly uses data to enhance users’ interactions and calibrate the user experience. Omid Vahdaty, CTO of Jutomate Ltd.,

Data Lake

Data Lake Big Data Sales Data-driven

Process price transparency data using AWS Glue

AWS Big Data

MAY 4, 2023

Phase 1 implementation of this regulation, which went into effect on July 1, 2022, requires that payors publish machine-readable files publicly for each plan that they offer. Athena is a serverless, interactive analytics service built on open-source frameworks, supporting open-table and file formats.

Insurance

Insurance Publishing Cost-Benefit Data Lake

Announcing the 2020 Data Impact Award Winners

Cloudera

NOVEMBER 18, 2020

We also celebrated the first-ever winner of the Data Impact Achievement Award — a new award category that recognizes one customer who has consistently achieved transformation across their business, pursuing a diverse set of use cases and creating a culture of data-driven innovation. . Data Impact Achievement Award.

Internet Publishing and Broadcasting

Internet Publishing and Broadcasting Data-driven Broadcasting Digital Transformation

How Amazon Finance Automation built a data mesh to support distributed data ownership and centralize governance

AWS Big Data

JULY 14, 2023

This enabled producers to publish data products that were curated and authoritative assets for their domain. For example, the AR team created and governed their cash application dataset in their AWS account AWS Glue Data Catalog. The global catalog The basic building block of our business-focused solutions are data products.

Finance

Finance Metadata Big Data Recreation/Entertainment

Accomplish Agile Business Intelligence & Analytics For Your Business

datapine

APRIL 15, 2020

17 software developers met to discuss lightweight development methods and subsequently produced the following manifesto : Manifesto for Agile Software Development: Individuals and interactions over processes and tools. It allows you to easily publish reports: the whole point of agile is to get the product out there.

Business Intelligence

Business Intelligence Analytics Testing Dashboards

Open Data Lakehouse powered by Iceberg for all your Data Warehouse needs

Cloudera

APRIL 3, 2023

In this blog, we will share with you in detail how Cloudera integrates core compute engines including Apache Hive and Apache Impala in Cloudera Data Warehouse with Iceberg. We will publish follow up blogs for other data services. Iceberg basics Iceberg is an open table format designed for large analytic workloads.

Data Warehouse

Data Warehouse Snapshot Metadata Cost-Benefit

Data Modeling 201 for the cloud: designing databases for data warehouses

erwin

JUNE 7, 2022

The first and most important thing to recognize and understand is the new and radically different target environment that you are now designing a data model for. Star schema: a data modeling and database design paradigm for data warehouses and data lakes. Business Focus. Operational. Operational Tactical.

Data Warehouse

Data Warehouse Modeling Sales Data Lake

Federate Amazon QuickSight access with open-source identity provider Keycloak

AWS Big Data

JUNE 13, 2023

Administrators can publish QuickSight applications on the Keycloak Admin console. Vamsi Bhadriraju is a Data Architect at AWS. He works closely with enterprise customers to build data lakes and analytical applications on the AWS Cloud. We also demonstrate ways to to assign QuickSight roles based on Keycloak membership.

Metadata

Metadata Dashboards Business Intelligence Management

Exploring the AI and data capabilities of watsonx

IBM Big Data Hub

JULY 17, 2023

While they require task-specific labeled data for fine tuning, they also offer clients the best cost performance trade-off for non-generative use cases. offers a Prompt Lab, where users can interact with different prompts using prompt engineering on generative AI models for both zero-shot prompting and few-shot prompting.

Machine Learning

Machine Learning Data Warehouse Modeling Cost-Benefit

Automate deployment of an Amazon QuickSight analysis connecting to an Amazon Redshift data warehouse with an AWS CloudFormation template

AWS Big Data

FEBRUARY 16, 2023

Prerequisites Before setting up the CloudFormation stacks, you must have an AWS account and an AWS Identity and Access Management (IAM) user with sufficient permissions to interact with the AWS Management Console and the services listed in the architecture. In QuickSight, you analyze and visualize your data in analyses.

Data Warehouse

Data Warehouse Sales Visualization Data Processing

AWS Glue streaming application to process Amazon MSK data using AWS Glue Schema Registry

AWS Big Data

JUNE 12, 2023

After the processed data is stored in Amazon S3, we create an AWS Glue crawler to create a Data Catalog table that acts as a metadata layer for the data. The table can be queried using Amazon Athena , a serverless, interactive query service that enables running SQL-like queries on data stored in Amazon S3.

Management

Management Metadata Testing Internet of Things

Generate security insights from Amazon Security Lake data using Amazon OpenSearch Ingestion

AWS Big Data

AUGUST 28, 2023

Optionally, specify the Amazon S3 storage class for the data in Amazon Security Lake. For more information, refer to Lifecycle management in Security Lake. Review the details and create the data lake. For example, choosing DNS Activity will give you dashboards of all DNS activity published in Amazon Security Lake.

Dashboards

Dashboards Visualization Metadata Management

Data for All: Empowering Users With AI, ML, and Analytics

Sisense

JUNE 12, 2019

Larger datasets were readily available because many private companies and public institutions decided to make their data open for everyone: text, video, speech, you name it. The hardware is there, the data is accumulating fast. As users become a bit savvier, they interact with dashboards making selections and applying filters.

Analytics

Analytics Data-driven Dashboards IoT

Dimensional modeling in Amazon Redshift

AWS Big Data

JULY 19, 2023

We show how to perform extract, transform, and load (ELT), an integration process focused on getting the raw data from a data lake into a staging layer to perform the modeling. Report and analysis the data in Amazon Quicksight QuickSight is a business intelligence service that makes it easy to deliver insights.

Modeling

Modeling Sales Data Warehouse Snapshot

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

Webinars

Trending Sources

Enable business users to analyze large datasets in your data lake with Amazon QuickSight

Webinars

Create an Apache Hudi-based near-real-time transactional data lake using AWS DMS, Amazon Kinesis, AWS Glue streaming ETL, and data visualization using Amazon QuickSight

Use the Amazon Redshift Data API to interact with Amazon Redshift Serverless

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

Interact with Apache Iceberg tables using Amazon Athena and cross account fine-grained permissions using AWS Lake Formation

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics, Part 3: Visualization and trend analysis using Amazon QuickSight

AWS Lake Formation 2022 year in review

Access Amazon Athena in your applications using the WebSocket API

Build an end-to-end serverless streaming pipeline with Apache Kafka on Amazon MSK using Python

What is a Data Pipeline?

Data platform trinity: Competitive or complementary?

How gaming companies can use Amazon Redshift Serverless to build scalable analytical applications faster and easier

Estimating Scope 1 Carbon Footprint with Amazon Athena

Automate large-scale data validation using Amazon EMR and Apache Griffin

Nexthink scales to trillions of events per day with Amazon MSK

Data Mesh 101: What it is and Why You Should Care

How The CIO Can Become The CMO’s Best Ally In The Use Of Data

The hidden history of Db2

The Enduring Significance of Data Modeling in the Modern Data-Driven Enterprise

Introducing watsonx: The future of AI for business

What is Data Mesh?

Accelerate Amazon Redshift secure data use with Satori – Part 1

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Data Visualization and Visual Analytics: Seeing the World of Data

Cloudera Data Warehouse Demonstrates Best-in-Class Cloud-Native Price-Performance

Celebrating Data Superheroes: The 2021 Data Impact Awards Winners

Analyze Amazon S3 storage costs using AWS Cost and Usage Reports, Amazon S3 Inventory, and Amazon Athena

Migrate from Amazon Kinesis Data Analytics for SQL Applications to Amazon Kinesis Data Analytics Studio

Periscope Data Expands to Israel, Empowering Data Teams with Powerful Tools

Process price transparency data using AWS Glue

Announcing the 2020 Data Impact Award Winners

How Amazon Finance Automation built a data mesh to support distributed data ownership and centralize governance

Accomplish Agile Business Intelligence & Analytics For Your Business

Open Data Lakehouse powered by Iceberg for all your Data Warehouse needs

Data Modeling 201 for the cloud: designing databases for data warehouses

Federate Amazon QuickSight access with open-source identity provider Keycloak

Exploring the AI and data capabilities of watsonx

Automate deployment of an Amazon QuickSight analysis connecting to an Amazon Redshift data warehouse with an AWS CloudFormation template

AWS Glue streaming application to process Amazon MSK data using AWS Glue Schema Registry

Generate security insights from Amazon Security Lake data using Amazon OpenSearch Ingestion

Data for All: Empowering Users With AI, ML, and Analytics

Dimensional modeling in Amazon Redshift

Stay Connected