Apache Spark (standalone)
=========================

Using Apache Spark in standalone mode is quite simple. We need to have one
Spark master node and multiple Spark worker nodes. Spark jobs are separate
node that related to master and worker nodes (master will submit this
application, worker relationships are only there for ordering - we do not want
to submit Spark job into partially prepared cluster).

.. code-block:: yaml

   node_templates:

     ${SPARK}_master_firewall:
       type: dice.firewall_rules.spark.Master

     ${SPARK}_master_vm:
       type: dice.hosts.ubuntu.${HOST_SIZE_MASTER}
       relationships:
         - type: dice.relationships.ProtectedBy
           target: ${SPARK}_master_firewall

     ${SPARK}_master:
       type: dice.components.spark.Master
       relationships:
         - type: dice.relationships.ContainedIn
           target: ${SPARK}_master_vm

     ${SPARK}_worker_firewall:
       type: dice.firewall_rules.spark.Worker

     ${SPARK}_worker_vm:
       type: dice.hosts.ubuntu.${HOST_SIZE_WORKER}
       instances:
         deploy: ${SPARK_WORKER_COUNT}
       relationships:
         - type: dice.relationships.ProtectedBy
           target: ${SPARK}_worker_firewall

     ${SPARK}_worker:
       type: dice.components.spark.Worker
       relationships:
         - type: dice.relationships.ContainedIn
           target: ${SPARK}_worker_vm
         - type: dice.relationships.spark.ConnectedToMaster
           target: ${SPARK}_master

     ${SPARK_JOB}:
       type: dice.components.spark.Application
       properties:
         jar: ${SPARK_JOB_JAR_LOCATION}
         class: ${SPARK_JOB_CLASS}
         name: ${SPARK_JOB_NAME}
         args: ${SPARK_JOB_ARGUMENTS}
       relationships:
         - type: dice.relationships.spark.SubmittedBy
           target: ${SPARK}_master
         - type: dice.relationships.Needs
           target: ${SPARK}_worker

Template variables
------------------

  SPARK
    The name of the Spark cluster. This is usually set to *spark*, which gives
    us *spark_master* and *spark_worker* nodes.

  SPARK_WORKER_COUNT
    Number of Spark worker instances that should be created when deploying
    cluster.

  SPARK_JOB
    The name of the Spark job that we wish to submit.

  SPARK_JOB_JAR_LOCATION
    Location of the Spark job jar. This can be either URL or relative path, in
    which case jar needs to be bundled with blueprint.

  SPARK_JOB_CLASS
    Name of the class that should be executed when submitting Spark job.

  SPARK_JOB_NAME
    Name that should be used for application when jar is submitted. This name
    can be seen in Spark UI.

  SPARK_JOB_ARGUMENTS
    Array of arguments that should be passed to jar when being submitted. If
    application takes no additional arguments, set this to ``[]``.

  HOST_SIZE_MASTER, HOST_SIZE_WORKER
    Sizes of the master and worker virtual machines. Available values are
    *Small*, *Medium* and *Large*.