Remove Data Lake Remove Data Processing Remove Data-driven Remove Metadata
article thumbnail

Governing data in relational databases using Amazon DataZone

AWS Big Data

Data governance is a key enabler for teams adopting a data-driven culture and operational model to drive innovation with data. Amazon DataZone allows you to simply and securely govern end-to-end data assets stored in your Amazon Redshift data warehouses or data lakes cataloged with the AWS Glue data catalog.

article thumbnail

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. In this post, we describe Orca’s journey building a transactional data lake using Amazon Simple Storage Service (Amazon S3), Apache Iceberg, and AWS Analytics.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Build a data lake with Apache Flink on Amazon EMR

AWS Big Data

To build a data-driven business, it is important to democratize enterprise data assets in a data catalog. With a unified data catalog, you can quickly search datasets and figure out data schema, data format, and location. For metadata read/write, Flink has the catalog interface.

article thumbnail

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

AWS Big Data

In addition to using native managed AWS services that BMS didn’t need to worry about upgrading, BMS was looking to offer an ETL service to non-technical business users that could visually compose data transformation workflows and seamlessly run them on the AWS Glue Apache Spark-based serverless data integration engine.

article thumbnail

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

AWS Big Data

Organizations often need to manage a high volume of data that is growing at an extraordinary rate. At the same time, they need to optimize operational costs to unlock the value of this data for timely insights and do so with a consistent performance. We think of this concept as inside-out data movement. Example Corp.

Data Lake 112
article thumbnail

How smava makes loans transparent and affordable using Amazon Redshift Serverless

AWS Big Data

This is a guest post co-written by Alex Naumov, Principal Data Architect at smava. smava believes in and takes advantage of data-driven decisions in order to become the market leader. smava believes in and takes advantage of data-driven decisions in order to become the market leader.

article thumbnail

Build efficient ETL pipelines with AWS Step Functions distributed map and redrive feature

AWS Big Data

AWS Step Functions is a fully managed visual workflow service that enables you to build complex data processing pipelines involving a diverse set of extract, transform, and load (ETL) technologies such as AWS Glue , Amazon EMR , and Amazon Redshift. The data is then cleansed, transformed, and uploaded to Amazon S3 for further processing.

Metadata 118