Page tree
Skip to end of metadata
Go to start of metadata

This page lists Helm charts and/or Docker images that are potential candidates for inclusion in a cloud-native PNDA.

Helm Charts


This set of Helm charts is authored by the developers of kubernetes contributions to Apache Spark.


  • Note from the repo → Note that the HDFS charts are currently in pre-alpha quality. They are also being heavily revised and are subject to change.
  • HDFS Datanode does not use Kubernetes Persistent Volumes

Gradiant developed an Alpine-based containers for HDFS:

  • Not tested in Kubernetes.
  • Helm charts on the roadmap.



  • Deployment of spark version 1.5.1.

Gradiant developed Alpine-based containers for Spark 2.x Standalone Deployment.


  • Spark-UI not well integrated on Kubernetes.
  • Gradiant Private Helm charts available (internal discussion to make them public).

Jupyter Hub

Zero to Jupyterhub is an official set of Jupyter images and Helm charts. The main drawback is that the pyspark-notebook image is > 5GB.

Jupyter notebooks docker image by jupyter is 2.1GB and 61 layers (uncompressed 6GB):

Gradiant developed alpine-based containers for Jupyter with datascience libraries included  is 955MB and 21 layers (uncompressed 2GB):

Dockerfile for further customizations is public at:


  • No user management (no jupyterhub).

Technology Alternatives

Airflow vs Oozie

Need to explore pros and cons of using instead of as the PNDA workflow manager.

  • No labels