- Define a set of common rules for the creation of Dockerfile in each pnda component.
- Set uniform environment variable names for container configuration (e.g., hdfs namenode URI must be configured with the same environment variable name in all components).
- Integrate the corresponding Dockerfile for each deployment unit into its repo. Dockerfile must be modified to get source code directly from its repo itself (COPY instruction) and extract the version maybe with a git describe.
- Integrate Apache Spark with kubernetes (supported by Spark 2.3 release).
- Dataset Persistence in Cloud. Study cloud storage alternatives to HDFS (S3, minio for on-prem). Improve Data-mgmt to work with Cold/Warm storage paradigms (S3 vs Glacier ...).
- Separate Infrastructure Orchestration from Container Orchestration, i.e., use a DevOps Tool such as SaltStack or Ansible to deploy Infrastructure resources (AWS EC2, MaaS,) and configure Kubernetes master/workers, then use Kubernetes to deploy PNDA deployment units.
- Infrastructure Orchestration can be further divided into infrastructure provision and Kubernetes deployment. Candidates are Terraform, Ansible, SaltStack and for kubernetes deployment Kubernetes-kops, kube-adm.
- Candidate tools to automate the orchestration of the multiple components/microservices of PNDA over Kubernetes are Ansible or Kuberentes specific tools such as Kubernetes Operators or Helm Charts.
- Platform Monitoring. Study the Integration of Prometheus.