article thumbnail

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

AWS Big Data

Since the deluge of big data over a decade ago, many organizations have learned to build applications to process and analyze petabytes of data. Data lakes have served as a central repository to store structured and unstructured data at any scale and in various formats.

Data Lake 100
article thumbnail

How Data Governance Protects Sensitive Data

erwin

How can companies protect their enterprise data assets, while also ensuring their availability to stewards and consumers while minimizing costs and meeting data privacy requirements? Data Security Starts with Data Governance. Lack of a solid data governance foundation increases the risk of data-security incidents.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Introducing AWS Glue crawler and create table support for Apache Iceberg format

AWS Big Data

Iceberg has become very popular for its support for ACID transactions in data lakes and features like schema and partition evolution, time travel, and rollback. Solution overview For our example use case, a customer uses Amazon EMR for data processing and Iceberg format for the transactional data. Choose Create.

article thumbnail

Announcing the 2021 Data Impact Awards

Cloudera

2020 saw us hosting our first ever fully digital Data Impact Awards ceremony, and it certainly was one of the highlights of our year. We saw a record number of entries and incredible examples of how customers were using Cloudera’s platform and services to unlock the power of data.

article thumbnail

Themes and Conferences per Pacoid, Episode 8

Domino Data Lab

Paco Nathan ‘s latest column dives into data governance. This month’s article features updates from one of the early data conferences of the year, Strata Data Conference – which was held just last week in San Francisco. In particular, here’s my Strata SF talk “Overview of Data Governance” presented in article form.

article thumbnail

What Is Alation Connected Sheets? Q&A with the Creators

Alation

It is also hard to know whether one can trust the data within a spreadsheet. And they rarely, if ever, host the most current data available. Sathish Raju, cofounder & CTO, Kloudio and senior director of engineering, Alation: This presents challenges for both business users and data teams.

article thumbnail

Extreme data center pressure? Burst to the cloud with CDP!

Cloudera

For example, the bank from our example might have separate destination data lakes for their perpetual and periodic workloads to support addressing these VIP workloads separately. Alternatively, you can also spin up a different compute cluster and access the data by using CDP’s Shared Data Experience.