Data Leaders Brief

applying-fine-grained-security-to-apache-spark

Applying Fine Grained Security to Apache Spark

Cloudera

AUGUST 3, 2022

Fine grained access control (FGAC) with Spark. The challenges of arbitrary code execution notwithstanding, there have been attempts to provide a stronger security model but with mixed results. One approach is to use 3rd party tools (such as Privacera ) that integrate with Spark.

Snapshot

Snapshot Cost-Benefit Machine Learning Data Transformation

Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation

AWS Big Data

JANUARY 17, 2024

With Amazon EMR 6.15, we launched AWS Lake Formation based fine-grained access controls (FGAC) on Open Table Formats (OTFs), including Apache Hudi, Apache Iceberg, and Delta lake. This combination of services allows you to conduct data analysis on your transactional data lake while ensuring secure and controlled access.

Data Lake

Data Lake Snapshot Big Data Data-driven

Join 52,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Manufacturing Sustainability Surge: Your Guide to Data-Driven Energy Optimization & Decarbonization

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

MORE WEBINARS

Trending Sources

Simplify authentication with native LDAP integration on Amazon EMR

AWS Big Data

FEBRUARY 20, 2024

This setup has been a key enabler to make corporate users and groups available inside EMR clusters and define access control policies to control their data access (for example, through the Amazon EMR native Apache Ranger integration ). For more details, refer to Tutorial: Configure a cross-realm trust with an Active Directory domain.

Testing

Testing Data Processing Interactive Management

Webinars

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Manufacturing Sustainability Surge: Your Guide to Data-Driven Energy Optimization & Decarbonization

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

MORE WEBINARS

How to Use Apache Iceberg in CDP’s Open Lakehouse

Cloudera

AUGUST 8, 2022

In June 2022, Cloudera announced the general availability of Apache Iceberg in the Cloudera Data Platform (CDP). Iceberg is a 100% open-table format, developed through the Apache Software Foundation , which helps users avoid vendor lock-in and implement an open lakehouse. .

Snapshot

Snapshot Data Warehouse Machine Learning Cost-Benefit

AWS Lake Formation 2022 year in review

AWS Big Data

JANUARY 31, 2023

Effective data governance enables better decision-making by improving data quality, reducing data management costs, and ensuring secure access to data for stakeholders. Effective data governance enables better decision-making by improving data quality, reducing data management costs, and ensuring secure access to data for stakeholders.

Data Lake

Data Lake Data Governance Data Architecture Machine Learning

Simplify and speed up Apache Spark applications on Amazon Redshift data with Amazon Redshift integration for Apache Spark

AWS Big Data

APRIL 20, 2023

Apache Spark is a popular framework that you can use to build applications for use cases such as ETL (extract, transform, and load), interactive analytics, and machine learning (ML). Amazon Redshift integration for Apache Spark helps developers seamlessly build and run Apache Spark applications on Amazon Redshift data.

Data Lake

Data Lake Data Warehouse Sales Data-driven

Fine-Grained Authorization with Apache Kudu and Apache Ranger

Cloudera

FEBRUARY 11, 2021

When Kudu was first introduced as a part of CDH in 2017, it didn’t support any kind of authorization so only air-gapped and non-secure use cases were satisfied. Coarse-grained authorization was added along with authentication in CDH 5.11 (Kudu 1.3.0) You’ll need to name the policy and set the resource it will apply to.

Metadata

Metadata Management IT Analytics

Providing fine-grained, trusted access to enterprise datasets with Okera and Domino

Domino Data Lab

OCTOBER 1, 2020

This explosion in both the amount of data as well users who need access to it has created new challenges, chief among them being how to provide secure access to this data at scale and how to give data scientists consistent, repeatable, and convenient access to the computational tools they need. PII, PHI, etc).

Enterprise

Enterprise Metadata Cost-Benefit Data Processing

Upgrade Journey: The Path from CDH to CDP Private Cloud

Cloudera

SEPTEMBER 28, 2020

Cloudera delivers an enterprise data cloud that enables companies to build end-to-end data pipelines for hybrid cloud, spanning edge devices to public or private cloud, with integrated security and governance underpinning it to protect customers data. Attribute-based access control and SparkSQL fine-grained access control.

Testing

Testing Metadata Risk Data Science

Value Proposition of the Cloudera Operational Database over Legacy Apache HBase Deployments

Cloudera

SEPTEMBER 9, 2021

The CDP Operational Database ( COD ) builds on the foundation of existing operational database capabilities that were available with Apache HBase and/or Apache Phoenix in legacy CDH and HDP deployments. Quantifiable performance improvements of Apache Hbase 2.2.x Quantifiable performance improvements of Apache Hbase 2.2.x

Cost-Benefit

Cost-Benefit Optimization Risk Management

Query your Apache Hive metastore with AWS Lake Formation permissions

AWS Big Data

JULY 20, 2023

Apache Hive is a SQL-based data warehouse system for processing highly distributed datasets on the Apache Hadoop platform. There are two key components to Apache Hive: the Hive SQL query engine and the Hive metastore (HMS).

Data Lake

Data Lake Metadata Data Processing Big Data

Build a transactional data lake using Apache Iceberg, AWS Glue, and cross-account data shares using AWS Lake Formation and Amazon Athena

AWS Big Data

APRIL 24, 2023

In 2022, we announced that you can enforce fine-grained access control policies using AWS Lake Formation and query data stored in any supported file format using table formats such as Apache Iceberg , Apache Hudi, and more using Amazon Athena queries.

Data Lake

Data Lake Data Governance Cost-Benefit Machine Learning

Implement fine-grained access control in Amazon SageMaker Studio and Amazon EMR using Apache Ranger and Microsoft Active Directory

AWS Big Data

NOVEMBER 8, 2023

SageMaker Studio comes with built-in integration with Amazon EMR , enabling data scientists to interactively prepare data at petabyte scale using frameworks such as Apache Spark, Hive, and Presto right from SageMaker Studio notebooks. Data access should only be permitted to non-sensitive customer, product, and orders data.

Testing

Testing Modeling Management Machine Learning

Introducing Apache Hudi support with AWS Glue crawlers

AWS Big Data

NOVEMBER 22, 2023

Apache Hudi is an open table format that brings database and data warehouse capabilities to data lakes. Apache Hudi helps data engineers manage complex challenges, such as managing continuously evolving datasets with transactions while maintaining query performance.

Data Lake

Data Lake Snapshot Metadata Optimization

Use IAM runtime roles with Amazon EMR Studio Workspaces and AWS Lake Formation for cross-account fine-grained access control

AWS Big Data

NOVEMBER 6, 2023

EMR Studio provides fully managed Jupyter notebooks and tools such as Spark UI and YARN Timeline Server via EMR Studio Workspaces. Data is often stored in data lakes managed by AWS Lake Formation , enabling you to apply fine-grained access control through a simple grant or revoke mechanism.

Data Lake

Data Lake Sales Management Testing

Introducing Apache Iceberg in Cloudera Data Platform

Cloudera

FEBRUARY 22, 2022

Over the past decade, the successful deployment of large scale data platforms at our customers has acted as a big data flywheel driving demand to bring in even more data, apply more sophisticated analytics, and on-board many new data practitioners from business analysts to data scientists. Key Design Goals .

Snapshot

Snapshot Metadata Cost-Benefit Data Architecture

Run Apache Hive workloads using Spark SQL with Amazon EMR on EKS

AWS Big Data

OCTOBER 18, 2023

Apache Hive is a distributed, fault-tolerant data warehouse system that enables analytics at a massive scale. Using Spark SQL to run Hive workloads provides not only the simplicity of SQL-like queries but also taps into the exceptional speed and performance provided by Spark. Apache Spark has supported queries written in HiveQL.

Big Data

Big Data Data Processing Interactive Testing

Five Strategies to Accelerate Data Product Development

Cloudera

JULY 26, 2021

Deep Java Learning, Apache Spark 3.x, Lambda or Kappa architectures) and implementing reliable streaming capabilities at scale by leveraging technologies such as Apache NiFi and Apache Kafka, has made possible the ability to harness and commercialize an ever-increasing volume of real-time data such as time-series or clickstream data.

Strategy

Strategy Marketing Data Science Unstructured Data

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

AWS Big Data

MARCH 3, 2023

You can then apply transformations and store data in Delta format for managing inserts, updates, and deletes. You can then apply transformations and store data in Delta format for managing inserts, updates, and deletes. Data transformation – Steps 3 and 4 represent an EMR Serverless Spark application (Amazon EMR 6.9

Data Lake

Data Lake Dashboards Metrics Metadata

10 Keys to a Secure Cloud Data Lakehouse

Cloudera

OCTOBER 25, 2022

The cloud data lakehouse brings multiple processing engines (SQL, Spark, and others) and modern analytical tools (ML, data engineering, and business intelligence) together in a unified analytical environment. Security function isolation. Consider this practice the most important function and foundation of your cloud security framework.

Data Processing

Data Processing Data Lake Cost-Benefit Risk

The value of CDP Public Cloud over legacy Hadoop-on-IaaS implementations

Cloudera

MAY 18, 2021

Prior the introduction of CDP Public Cloud, many organizations that wanted to leverage CDH, HDP or any other on-prem Hadoop runtime in the public cloud had to deploy the platform in a lift-and-shift fashion, commonly known as “Hadoop-on-IaaS” or simply the IaaS model.

Cost-Benefit

Cost-Benefit Data-driven Machine Learning Data Warehouse

What Is Embedded Analytics?

Jet Global

MAY 1, 2023

Introduction Why should I read the definitive guide to embedded analytics? Every application provider has the same goals: to help their users work more efficiently, and to drive user adoption. But many companies fail to achieve this goal because they struggle to provide the reporting and analytics users have come to expect. intranets or extranets).

Analytics

Analytics Cost-Benefit Visualization Dashboards

Applying Fine Grained Security to Apache Spark

Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation

Webinars

Trending Sources

Simplify authentication with native LDAP integration on Amazon EMR

Webinars

How to Use Apache Iceberg in CDP’s Open Lakehouse

AWS Lake Formation 2022 year in review

Simplify and speed up Apache Spark applications on Amazon Redshift data with Amazon Redshift integration for Apache Spark

Fine-Grained Authorization with Apache Kudu and Apache Ranger

Providing fine-grained, trusted access to enterprise datasets with Okera and Domino

Upgrade Journey: The Path from CDH to CDP Private Cloud

Value Proposition of the Cloudera Operational Database over Legacy Apache HBase Deployments

Query your Apache Hive metastore with AWS Lake Formation permissions

Build a transactional data lake using Apache Iceberg, AWS Glue, and cross-account data shares using AWS Lake Formation and Amazon Athena

Implement fine-grained access control in Amazon SageMaker Studio and Amazon EMR using Apache Ranger and Microsoft Active Directory

Introducing Apache Hudi support with AWS Glue crawlers

Use IAM runtime roles with Amazon EMR Studio Workspaces and AWS Lake Formation for cross-account fine-grained access control

Introducing Apache Iceberg in Cloudera Data Platform

Run Apache Hive workloads using Spark SQL with Amazon EMR on EKS

Five Strategies to Accelerate Data Product Development

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

10 Keys to a Secure Cloud Data Lakehouse

The value of CDP Public Cloud over legacy Hadoop-on-IaaS implementations

What Is Embedded Analytics?

Stay Connected