Data Lake, Data Processing and Document

Data Lake

Data Processing

Document

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

AWS Big Data

OCTOBER 30, 2024

This is part two of a three-part series where we show how to build a data lake on AWS using a modern data architecture. This post shows how to load data from a legacy database (SQL Server) into a transactional data lake ( Apache Iceberg ) using AWS Glue. To start the job, choose Run. format(dbname)).config("spark.sql.catalog.glue_catalog.catalog-impl",

Data Lake

Data Lake Data Processing Optimization Machine Learning

Create an Apache Hudi-based near-real-time transactional data lake using AWS DMS, Amazon Kinesis, AWS Glue streaming ETL, and data visualization using Amazon QuickSight

AWS Big Data

AUGUST 3, 2023

Data analytics on operational data at near-real time is becoming a common need. Due to the exponential growth of data volume, it has become common practice to replace read replicas with data lakes to have better scalability and performance. For more information, see Changing the default settings for your data lake.

Data Lake

Data Lake Visualization Dashboards Insurance

Join 42,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Business Intelligence 101: How To Make The Best Solution Decision For Your Organization

Improving the Accuracy of Generative AI Systems: A Structured Approach

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Marketing Operations in 2025: A New Framework for Success

MORE WEBINARS

Trending Sources

Configure cross-Region table access with the AWS Glue Catalog and AWS Lake Formation

AWS Big Data

AUGUST 3, 2023

Today’s modern data lakes span multiple accounts, AWS Regions, and lines of business in organizations. It’s important that their data solution gives them the ability to share and access data securely and safely across Regions. For example, we are using a data lake administrator role called LF-Admin.

Data Lake

Data Lake Metadata Management Data Processing

Webinars

Business Intelligence 101: How To Make The Best Solution Decision For Your Organization

Improving the Accuracy of Generative AI Systems: A Structured Approach

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Marketing Operations in 2025: A New Framework for Success

MORE WEBINARS

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

AWS Big Data

MARCH 7, 2024

At the same time, they need to optimize operational costs to unlock the value of this data for timely insights and do so with a consistent performance. With this massive data growth, data proliferation across your data stores, data warehouse, and data lakes can become equally challenging.

Data Lake

Data Lake Analytics Dashboards Metrics

Migrate Hive data from CDH to CDP public cloud

Cloudera

JUNE 25, 2021

The Replication Manager support matrix is documented in our public docs. This blog post outlines detailed step by step instructions to perform Hive Replication from an on-prem CDH cluster to a CDP Public Cloud Data Lake. CDP Data Lake cluster versions – CM 7.4.0, Pre-Check: Data Lake Cluster.

Data Lake

Data Lake Metadata Unstructured Data Management

Use AWS Glue to streamline SFTP data processing

AWS Big Data

AUGUST 13, 2024

With AWS Glue, you can discover and connect to hundreds of diverse data sources and manage your data in a centralized data catalog. It enables you to visually create, run, and monitor extract, transform, and load (ETL) pipelines to load data into your data lakes. Choose Store a new secret.

Data Processing

Data Processing Visualization Data Lake Data Processing

Implement alerts in Amazon OpenSearch Service with PagerDuty

AWS Big Data

JUNE 8, 2023

You can use the plugin to set up different monitors, including cluster health, an individual document, a custom query, or aggregated data. For Host , enter events.PagerDuty.com. At AWS, he is focused on Data Lake implementations, and Search, Analytical workloads using Amazon OpenSearch Service.

Data Lake

Data Lake Dashboards Metrics Testing

Enrich your serverless data lake with Amazon Bedrock

AWS Big Data

SEPTEMBER 26, 2024

Organizations are collecting and storing vast amounts of structured and unstructured data like reports, whitepapers, and research documents. By consolidating this information, analysts can discover and integrate data from across the organization, creating valuable data products based on a unified dataset.

Data Lake

Data Lake Cost-Benefit Unstructured Data Modeling

Dairyland powers up for a generative AI edge

CIO Business Intelligence

APRIL 9, 2024

Beginning in 2021, the Minneapolis-based Microsoft partner helped Dairyland migrate from several custom legacy applications to a commercial implementation of Dynamics 365 and an Azure data lake, which set the stage for the power company’s early foray into AI, according to the systems integrator.

Digital Transformation

Digital Transformation Machine Learning Data Lake Software

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

AWS Big Data

AUGUST 19, 2024

This involves creating VPC endpoints in both the AWS and Snowflake VPCs, making sure data transfer remains within the AWS network. Use Amazon Route 53 to create a private hosted zone that resolves the Snowflake endpoint within your VPC. This unlocks scalable analytics while maintaining data governance, compliance, and access control.

Analytics

Analytics Data-driven Data Integration Data Lake

Why enterprise CIOs need to plan for Microsoft gen AI

CIO Business Intelligence

AUGUST 14, 2024

Start where your data is Using your own enterprise data is the major differentiator from open access gen AI chat tools, so it makes sense to start with the provider already hosting your enterprise data. If you pull your data from a document with no permission set on it, then there’s no information to be had,” he adds.

Enterprise

Enterprise Cost-Benefit Experimentation Modeling

How data literacy allows gen AI to drive productivity at Dow

CIO Business Intelligence

JULY 31, 2024

A significant Copilot use case has been finding documents. We also have a blended architecture of deep process capabilities in our SAP system and decision-making capabilities in our Microsoft tools, and a great base of information in our integrated data hub, or data lake, which is all Microsoft-based.

Manufacturing

Manufacturing Cost-Benefit Digital Transformation Forecasting

How Data Governance Protects Sensitive Data

erwin

APRIL 2, 2021

And knowing the business purpose translates into actively governing personal data against potential privacy and security violations. Do You Know Where Your Sensitive Data Is? Data is a valuable asset used to operate, manage and grow a business.

Data Governance

Data Governance Cost-Benefit Risk Metadata

Break data silos and stream your CDC data with Amazon Redshift streaming and Amazon MSK

AWS Big Data

DECEMBER 13, 2023

Apache Kafka is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications. This solution uses Amazon Aurora MySQL hosting the example database salesdb.

Data Warehouse

Data Warehouse Snapshot Data Processing Internet of Things

Power enterprise-grade Data Vaults with Amazon Redshift – Part 2

AWS Big Data

NOVEMBER 16, 2023

Amazon Redshift is a popular cloud data warehouse, offering a fully managed cloud-based service that seamlessly integrates with an organization’s Amazon Simple Storage Service (Amazon S3) data lake, real-time streams, machine learning (ML) workflows, transactional workflows, and much more—all while providing up to 7.9x

Enterprise

Enterprise Data Warehouse Snapshot Cost-Benefit

Introducing Amazon EMR on EKS job submission with Spark Operator and spark-submit

AWS Big Data

JUNE 6, 2023

Verify the job by running the following command: kubectl get pods -n data-team-a Enable access to the Spark UI The Spark UI is an important tool for data engineers because it allows you to track the progress of tasks, view detailed job and stage information, and analyze resource utilization to identify bottlenecks and optimize your code.

Optimization

Optimization Data Lake Cost-Benefit Management

Accomplish Agile Business Intelligence & Analytics For Your Business

datapine

APRIL 15, 2020

Working software over comprehensive documentation. The agile BI implementation methodology starts with light documentation: you don’t have to heavily map this out. You need to determine if you are going with an on-premise or cloud-hosted strategy. Finalize documentation, where necessary. Document only when necessary.

Business Intelligence

Business Intelligence Analytics Testing Dashboards

Themes and Conferences per Pacoid, Episode 8

Domino Data Lab

APRIL 3, 2019

Fun fact : I co-founded an e-commerce company (realistically, a mail-order catalog hosted online) in December 1992 using one of those internetworking applications called Gopher , which was vaguely popular at the time. They sold off most of the company later, retaining some of its IP, and are known to have kept copies of internal documents.

Data Governance

Data Governance Machine Learning Metadata Data Science

Create your Private Data Warehousing Environment Using Azure Kubernetes Service

Cloudera

DECEMBER 2, 2021

In addition to AKS and the load balancers mentioned above, this includes VNET, Data Lake Storage, PostgreSQL Azure database, and more. By default Azure Data Lake Storage, PostgreSQL Database, and Virtual Machines are accessible over public endpoints. The full steps are included in our public documentation.

Data Lake

Data Lake Data Warehouse Data Processing Interactive

FINRA CIO Steve Randich pushes the public cloud forward

CIO Business Intelligence

FEBRUARY 10, 2023

While managing unstructured data remains a challenge for 36% of organizations, according to the 2022 Foundry Data and Analytics Research survey, many IT leaders are actively seeking ways of harnessing all types of data stored in data lakes.

Unstructured Data

Unstructured Data Data Lake Machine Learning Enterprise

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

AWS Big Data

AUGUST 1, 2024

If your data warehouse platform has gone through multiple enhancements over the years, your operational service levels documentation may not be current with the latest operational metrics and desired SLAs for each tenant (such as business unit, data domain, or organization group).

Data Warehouse

Data Warehouse KPI Optimization Cost-Benefit

10 Keys to a Secure Cloud Data Lakehouse

Cloudera

OCTOBER 25, 2022

The data lakehouse is gaining in popularity because it enables a single platform for all your enterprise data with the flexibility to run any analytic and machine learning (ML) use case. Cloud data lakehouses provide significant scaling, agility, and cost advantages compared to cloud data lakes and cloud data warehouses.

Data Processing

Data Processing Data Lake Cost-Benefit Risk

Get Your Analytics Insights Instantly – Without Abandoning Central IT

Cloudera

JANUARY 21, 2021

Cloudera’s Data Warehouse service allows raw data to be stored in the cloud storage of your choice (S3, ADLSg2). It will be stored in your own namespace, and not force you to move data into someone else’s proprietary file formats or hosted storage. Proprietary file formats mean no one else is invited in! Separate compute.

Data Warehouse

Data Warehouse Data Lake IT Analytics

CIOs rise to the ESG reporting challenge

CIO Business Intelligence

JANUARY 30, 2024

“Always the gatekeepers of much of the data necessary for ESG reporting, CIOs are finding that companies are even more dependent on them,” says Nancy Mentesana, ESG executive director at Labrador US, a global communications firm focused on corporate disclosure documents.

Reporting

Reporting Data Quality Strategy Data-driven

Implement disaster recovery with Amazon Redshift

AWS Big Data

JUNE 27, 2024

Document the entire disaster recovery process. Disaster recovery strategies Amazon Redshift is a cloud-based data warehouse that supports many recovery capabilities out of the box to address unforeseen outages and minimize downtime. Choose your hosted zone. On the Route 53 console, choose Hosted zones in the navigation pane.

Snapshot

Snapshot Data Warehouse Data Processing Strategy

Design a data mesh on AWS that reflects the envisioned organization

AWS Big Data

JANUARY 22, 2024

Users can also raise requests to producers to improve the way the data is presented or to enrich the data with new data points for generating a higher business value. At the same time, each team can also map other catalogs to their own account and use their own data, which they produce along with the data from other accounts.

Data-driven

Data-driven Advertising Metadata Data Architecture

Federate Amazon QuickSight access with open-source identity provider Keycloak

AWS Big Data

JUNE 13, 2023

Insert your specific host domain name where the Keycloak application resides in the following URL: [link] /realms/aws-realm/protocol/saml/descriptor. For Metadata document , upload the Keycloak IdP SAML metadata XML file you downloaded and saved to your local machine earlier. Vamsi Bhadriraju is a Data Architect at AWS.

Metadata

Metadata Dashboards Business Intelligence Data Lake

Ingest, transform, and deliver events published by Amazon Security Lake to Amazon OpenSearch Service

AWS Big Data

JUNE 19, 2023

Security Lake automatically centralizes security data from cloud, on-premises, and custom sources into a purpose-built data lake stored in your account. With Security Lake, you can get a more complete understanding of your security data across your entire organization.

Publishing

Publishing Dashboards Visualization Management

Exploring the AI and data capabilities of watsonx

IBM Big Data Hub

JULY 17, 2023

These encoder-only architecture models are fast and effective for many enterprise NLP tasks, such as classifying customer feedback and extracting information from large documents. While they require task-specific labeled data for fine tuning, they also offer clients the best cost performance trade-off for non-generative use cases.

Machine Learning

Machine Learning Data Warehouse Modeling Cost-Benefit

Generate security insights from Amazon Security Lake data using Amazon OpenSearch Ingestion

AWS Big Data

AUGUST 28, 2023

OpenSearch Ingestion reads Parquet formatted security data from the Security Lake managed Amazon S3 bucket and transforms the security logs into JSON documents. OpenSearch Ingestion ingests this OCSF compliant data into OpenSearch Service. For more information, refer to Lifecycle management in Security Lake.

Dashboards

Dashboards Visualization Metadata Management

Building and Evaluating GenAI Knowledge Management Systems using Ollama, Trulens and Cloudera

Cloudera

MAY 23, 2024

In modern enterprises, the exponential growth of data means organizational knowledge is distributed across multiple formats, ranging from structured data stores such as data warehouses to multi-format data stores like data lakes. This makes gathering information for decision making a challenge.

Management

Management Metrics Data Processing Data Lake

Make Better Data-Driven Decisions with DataRobot AI Platform Single-Tenant SaaS on Microsoft Azure

DataRobot Blog

MARCH 7, 2023

The DataRobot AI Platform seamlessly integrates with Azure cloud services, including Azure Machine Learning, Azure Data Lake Storage Gen 2 (ADLS), Azure Synapse Analytics, and Azure SQL database. Models trained in DataRobot can also be easily deployed to Azure Machine Learning, allowing users to host models easier in a secure way.

Data-driven

Data-driven Machine Learning Experimentation Data Lake

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Cloudera

OCTOBER 7, 2021

Those decentralization efforts appeared under different monikers through time, e.g., data marts versus data warehousing implementations (a popular architectural debate in the era of structured data) then enterprise-wide data lakes versus smaller, typically BU-Specific, “data ponds”.

Metadata

Metadata Cost-Benefit Enterprise Interactive

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

AWS Big Data

OCTOBER 30, 2024

Amazon DataZone now launched authentication supports through the Amazon Athena JDBC driver, allowing data users to seamlessly query their subscribed data lake assets via popular business intelligence (BI) and analytics tools like Tableau, Power BI, Excel, SQL Workbench, DBeaver, and more.

Visualization

Visualization Data Lake Testing Data Governance

Data Governance for Dummies: Your Questions, Answered

Alation

FEBRUARY 17, 2023

This past week, I had the pleasure of hosting Data Governance for Dummies author Jonathan Reichental for a fireside chat , along with Denise Swanson , Data Governance lead at Alation. How do you get executives to understand the value of data governance? First, document your successes of good data, and how it happened.

Data Governance

Data Governance Data Quality Metadata Cost-Benefit

What is Data Mapping?

Jet Global

FEBRUARY 23, 2024

An on-premise solution provides a high level of control and customization as it is hosted and managed within the organization’s physical infrastructure, but it can be expensive to set up and maintain. Next, identify the data sources that will be involved in the mapping.

Data Warehouse

Data Warehouse Reporting Data Transformation Sales

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

Create an Apache Hudi-based near-real-time transactional data lake using AWS DMS, Amazon Kinesis, AWS Glue streaming ETL, and data visualization using Amazon QuickSight

Webinars

Trending Sources

Configure cross-Region table access with the AWS Glue Catalog and AWS Lake Formation

Webinars

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

Migrate Hive data from CDH to CDP public cloud

Use AWS Glue to streamline SFTP data processing

Implement alerts in Amazon OpenSearch Service with PagerDuty

Enrich your serverless data lake with Amazon Bedrock

Dairyland powers up for a generative AI edge

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

Why enterprise CIOs need to plan for Microsoft gen AI

How data literacy allows gen AI to drive productivity at Dow

How Data Governance Protects Sensitive Data

Break data silos and stream your CDC data with Amazon Redshift streaming and Amazon MSK

Power enterprise-grade Data Vaults with Amazon Redshift – Part 2

Introducing Amazon EMR on EKS job submission with Spark Operator and spark-submit

Accomplish Agile Business Intelligence & Analytics For Your Business

Themes and Conferences per Pacoid, Episode 8

Create your Private Data Warehousing Environment Using Azure Kubernetes Service

FINRA CIO Steve Randich pushes the public cloud forward

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

10 Keys to a Secure Cloud Data Lakehouse

Get Your Analytics Insights Instantly – Without Abandoning Central IT

CIOs rise to the ESG reporting challenge

Implement disaster recovery with Amazon Redshift

Design a data mesh on AWS that reflects the envisioned organization

Federate Amazon QuickSight access with open-source identity provider Keycloak

Ingest, transform, and deliver events published by Amazon Security Lake to Amazon OpenSearch Service

Exploring the AI and data capabilities of watsonx

Generate security insights from Amazon Security Lake data using Amazon OpenSearch Ingestion

Building and Evaluating GenAI Knowledge Management Systems using Ollama, Trulens and Cloudera

Make Better Data-Driven Decisions with DataRobot AI Platform Single-Tenant SaaS on Microsoft Azure

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

Data Governance for Dummies: Your Questions, Answered

What is Data Mapping?

Stay Connected