article thumbnail

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

AWS Big Data

They enable transactions on top of data lakes and can simplify data storage, management, ingestion, and processing. These transactional data lakes combine features from both the data lake and the data warehouse. One important aspect to a successful data strategy for any organization is data governance.

Data Lake 107
article thumbnail

Amazon DataZone now integrates with AWS Glue Data Quality and external data quality solutions

AWS Big Data

If the asset has AWS Glue Data Quality enabled, you can now quickly visualize the data quality score directly in the catalog search pane. By selecting the corresponding asset, you can understand its content through the readme, glossary terms , and technical and business metadata.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Five actionable steps to GDPR compliance (Right to be forgotten) with Amazon Redshift

AWS Big Data

By creating visual representations of data flows, organizations can gain a clear understanding of the lifecycle of personal data and identify potential vulnerabilities or compliance gaps. Note that putting a comprehensive data strategy in place is not in scope for this post. However, this is beyond the scope of this post.

article thumbnail

How Amazon Devices scaled and optimized real-time demand and supply forecasts using serverless analytics

AWS Big Data

We also used AWS Lambda for data processing. To further optimize and improve the developer velocity for our data consumers, we added Amazon DynamoDB as a metadata store for different data sources landing in the data lake. Clients access this data store with an API’s.

article thumbnail

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

AWS Big Data

The File Manager Lambda function consumes those messages, parses the metadata, and inserts the metadata to the DynamoDB table odpf_file_tracker. We use the following terminology when discussing File Processor: Refresh cadence – This represents the data ingestion frequency (for example, 10 minutes).

article thumbnail

What Is Data Intelligence?

Alation

What Is Data Intelligence? Data intelligence is a system to deliver trustworthy, reliable data. It includes intelligence about data, or metadata. IDC coined the term, stating, “data intelligence helps organizations answer six fundamental questions about data.” Yet finding data is just the beginning.