Apache Druid is an open-source data store designed for sub-second queries on real-time and historical data. It is primarily used for business intelligence (OLAP) queries on event data.
Druid provides low latency (real-time) data ingestion, flexible data exploration, and fast data aggregation. Existing Druid deployments have scaled to trillions of events and petabytes of data.
Druid is most commonly used to power user-facing analytic applications. It can load both streaming and batch data and integrates with Samza, Kafka, Storm, Spark, Flink and Hadoop.
Druid can be considered as a OLAP support option from the previously proposed Hadoop based OLAP tool Kylin in PDP-4.
The following section discusses the changes required to each PNDA component.
Druid resources and any other dependencies will be hosted on the PNDA mirror. The mirror build script will need to include these in the appropriate mirror section.
For Druid cluster, will launch new nodes for Druid borker, Historicals and MiddleManagers, Coordinator and Overlord processes and use the existing nodes of PNDA cluster for kafka, zookeeper, and mysql for druid metadata storage.
Support will be added for deploying and configuring Druid components in heat templates and salt configuration files respectively.
A Druid component plugin will be created that will run druid applications. A supervisor will be set up on the PNDA edge node that will call the druid CLI to process the durid query operation.
The PNDA console dashboard page will be modified to include add Druid blocks under data storage.
Each druid component will have a specific log file for debugging purpose.
The community druid example applications will be created that demonstrates use of druid.
Sections of guide will need creating or updating to reference Druid
(Refer http://druid.io/docs/0.12.1/tutorials/quickstart.html for Druid single node deployment.)
Along with the changes made from the above 6 components and corresponding documentation effort. the following tasks will be fulfilled:
Data ingestion through kafka/Tranquility
Data ingestion status display in Druid console from PNDA console
Sample OLAP queries from REST client
(stretch goal) The above can be verified in AWS Pico setup with the help from community.
Start Druid or set up connection to standing Druid cluster at PNDA creation.
OLAP queries to Druid data from PNDA console.
Tranquility could be installed along with Druid as the real time event data injection mechanism consuming data from the data/message bus.