Page tree
Skip to end of metadata
Go to start of metadata

Proof of Concept

Spark does not have an official docker image at dockerhub at the time of writing. We use gradiant/spark:2.4.0 image.
This image is build from a modified version of the dockerfile provided in the Apache Spark distribution (spark-2.4.0/kubernetes/dockerfiles/spark/), that includes hadoop native libraries for Alpine Linux and Kafka Libraries.

Receipt to try Spark with Kubernetes as Job Scheduler. We will use a local kubernetes deployment for testing (minikube).

minikube start --memory=4096 --cpus=3

Setting up a kubernetes serviceaccount with permissions to create pods and services:

kubectl create serviceaccount spark
kubectl create rolebinding spark-role --clusterrole=edit --serviceaccount=default:spark --namespace=default
# local proxy for Kubernetes API
kubectl proxy

Now kubernetes API is accessible at

We run a container as spark client and configure Kubernetes as master:

docker run --rm -ti --net host gradiant/spark:2.4.0 spark-submit \
--master k8s:// \
--deploy-mode cluster \
--name spark-pi \
--class org.apache.spark.examples.SparkPi \
--conf spark.executor.instances=2 \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.kubernetes.container.image=gradiant/spark:2.4.0-k8s \
--conf spark.kubernetes.executor.request.cores=0.2 \
--executor-memory 500M \
$SPARK_HOME/examples/jars/spark-examples_2.11-2.4.0.jar 100

Integration with Jupyter Notebooks

To use Spark on K8s inside a jupyter notebook we need to:

  • No labels