- Define a set of common rules for the creation of Dockerfile in each pnda component.
- Set uniform environment variable names for container configuration (e.g., hdfs namenode URI must be configured with the same environment variable name in all components).
- Integrate the corresponding Dockerfile for each deployment unit into its repo. Dockerfile must be modified to get source code directly from its repo itself (COPY instruction) and extract the version maybe with a git describe.
- Integrate Apache Spark with kubernetes (supported by Spark 2.3 release).
- Dataset Persistence in Cloud. Study cloud storage alternatives to HDFS (S3, minio for on-prem). Improve Data-mgmt to work with Cold/Warm storage paradigms (S3 vs Glacier ...).
- Separate Infrastructure Orchestration from Container Orchestration, i.e., use a DevOps Tool such as SaltStack or Ansible to deploy Infrastructure resources (AWS EC2, MaaS,) and configure Kubernetes master/workers, then use Kubernetes to deploy PNDA deployment units.
- Platform Monitoring. Study the Integration of Prometheus.