Remove spark-on-kubernetes-gang-scheduling-with-yunikorn
article thumbnail

Spark on Kubernetes – Gang Scheduling with YuniKorn

Cloudera

Apache YuniKorn (Incubating) has just released 0.10.0 ( release announcement ). As part of this release, a new feature called Gang Scheduling has become available. By leveraging the Gang Scheduling feature, Spark jobs scheduling on Kubernetes becomes more efficient. What is Gang Scheduling?

Metadata 135
article thumbnail

Optimizing Cloudera Data Engineering Autoscaling Performance

Cloudera

That’s why turning to traditional resource scheduling is not sufficient. That’s why turning to traditional resource scheduling is not sufficient. When building CDE, we integrated with Apache YuniKorn which offers rich scheduling capabilities on Kubernetes. . fixed sized clusters).

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Cloudera Data Engineering 2021 Year End Review

Cloudera

Since the release of Cloudera Data Engineering (CDE) more than a year ago , our number one goal was operationalizing Spark pipelines at scale with first class tooling designed to streamline automation and observability. We also introduced Apache Airflow on Kubernetes as the next generation orchestration service.

Snapshot 115
article thumbnail

Introducing Amazon EMR on EKS job submission with Spark Operator and spark-submit

AWS Big Data

Amazon EMR on EKS provides a deployment option for Amazon EMR that allows organizations to run open-source big data frameworks on Amazon Elastic Kubernetes Service (Amazon EKS). With EMR on EKS, Spark applications run on the Amazon EMR runtime for Apache Spark. The EMR runtime provides up to 5.37 times better performance and 76.8%

article thumbnail

Apache Spark on Kubernetes: How Apache YuniKorn (Incubating) helps

Cloudera

Why choose K8s for Apache Spark. Apache Spark unifies batch processing, real-time processing, stream analytics, machine learning, and interactive query in one-platform. Support for multiple Spark versions, Python versions, and version-controlled containers on the shared K8s clusters for both faster iteration and stable production.