Page tree
Skip to end of metadata
Go to start of metadata

This page lists Helm charts and/or Docker images that are potential candidates for inclusion in a cloud-native PNDA.

Helm Charts

HDFS

This set of Helm charts is authored by the developers of kubernetes contributions to Apache Spark.

https://github.com/apache-spark-on-k8s/kubernetes-HDFS/tree/master/charts

CONS:

  • Note from the repo → Note that the HDFS charts are currently in pre-alpha quality. They are also being heavily revised and are subject to change.
  • HDFS Datanode does not use Kubernetes Persistent Volumes

Gradiant developed an Alpine-based containers for HDFS:

https://hub.docker.com/r/gradiant/hdfs-datanode

  • Not tested in Kubernetes.
  • Helm charts on the roadmap.

Spark

https://github.com/helm/charts/tree/master/stable/spark

CONS:

  • Deployment of spark version 1.5.1.

Gradiant developed Alpine-based containers for Spark 2.x Standalone Deployment.

https://hub.docker.com/r/gradiant/spark

CONS:

  • Spark-UI not well integrated on Kubernetes.
  • Gradiant Private Helm charts available (internal discussion to make them public).

Jupyter Hub

Zero to Jupyterhub is an official set of Jupyter images and Helm charts. The main drawback is that the pyspark-notebook image is > 5GB.

https://zero-to-jupyterhub.readthedocs.io/en/latest/index.html

Jupyter notebooks docker image by jupyter is 2.1GB and 61 layers (uncompressed 6GB):

https://hub.docker.com/r/jupyter/datascience-notebook

Gradiant developed alpine-based containers for Jupyter with datascience libraries included  is 955MB and 21 layers (uncompressed 2GB):

https://hub.docker.com/r/gradiant/jupyter/

Dockerfile for further customizations is public at:

https://github.com/Gradiant/dockerized-jupyter

CONS:

  • No user management (no jupyterhub).

Technology Alternatives

Airflow vs Oozie

Need to explore pros and cons of using https://airflow.apache.org/ instead of http://oozie.apache.org/ as the PNDA workflow manager.

  • No labels