metalus

This project aims to make writing Spark applications easier by abstracting the effort to assemble the driver into reusable steps and pipelines.

View project on GitHub

Documentation Home

Executions

An execution is a body of work within a Metalus application. An execution defines which pipelines are executed and the proper context.

Execution Summary

A single execution may run one or more pipelines in the order they are listed. As each pipeline completes, the status is evaluated and a determination is made as to whether the next pipeline should be run.

In addition to executing pipelines, an execution may be dependent on zero, one or more other executions. Executions may run in parallel or be dependent on other executions. When a dependency exists, the execution will wait until all parent executions complete with a favorable status before executing.

Pipeline Execution Plan Example

Execution Results Syntax

When one execution has a dependency on one or more executions, the globals and parameters objects will be taken from the final PipelineContext and injected into the globals object of the child executions PipelineContext. Values access is available using the following mapping syntax:

Access the primary return of a step:

!<executionId>.pipelineParameters.<pipelineId>.<stepId>.primaryReturn

Access the secondary return of a step:

!<executionId>.pipelineParameters.<pipelineId>.<stepId>.namedReturn

Access the secondary return named value of a step:

!<executionId>.pipelineParameters.<pipelineId>.<stepId>.namedReturn.<valueName>

Access a global:

!<executionId>.globals.<globalName>

In the event that the result of an execution plan results in an exception or one of the pipelines being paused or errored, then downstream executions will not run.

Global Links can be created to provide a shortened name that points to an object from a different pipeline or execution to prevent the pipeline designer from having to type in the long name in multiple places in their application. These abbreviated names are stored in the pipelineContext.globals under the name “GlobalLinks”. It contains key/value pairs where the key is the shortened name and the value is the fully qualified parameter name (with executionIds, pipelineIds, etc… as stated above). These are accessed as typical globals in the step values using the !shortenedName syntax.

"globals": {
  "GlobalLinks": {
    "myPrimaryReturn": "!<executionId>.pipelineParameters.<pipelineId>.<stepId>.primaryReturn",
    "mySecondaryReturn": "!<executionId>.pipelineParameters.<pipelineId>.<stepId>.namedReturn.<valueName>"
  }
}

To access the value of myPrimaryReturn in a future step, the user would use !myPrimaryReturn and would get the value returned from the parameter in the value.

Note: in the case of global name collision, the latest value (child over parent) for a shortened name will be used.

Forks (MVP)

An execution may process a list of values in parallel by changing the executionType to fork and providing the forkByValue attribute. The behavior is similar to fork steps within pipelines with the exception that the fork and join executions will run the provided pipelines. The forkByValue is a mapping string will be applied to the execution globals in an effort to locate the list which is used to spin up parallel processes. Within the fork execution, the individual fork value will be assigned to a global named executionForkValue. A second global named executionForkValueIndex will be set which contains the index of the value in the original list. All child executions of the fork will process in parallel until a join execution (executionType will be join) is reached. The join execution will be executed once and the output (pipelineParameters and globals) of the parallel executions will be merged into a list. A join execution is required.

  • executionType - An optional type. Default is pipeline. fork and join are also options.
  • forkByValues - An optional mapping that will be applied to globals to identify a list of values that should be processed in parallel. This attribute is only used when executionType is set to fork.

    Evaluations

    An execution can now provide pipelines that will run prior to the main pipelines and determine whether the execution should RUN, STOP or SKIP. Prior to this feature, executions default behavior was RUN and STOP. The SKIP action provides a new behavior that allows an execution to skip running the pipelines and run the child executions.

  • evaluationPipelines - An optional array of pipelines to run to determine if the execution should run, stop or skip. These pipelines will be executed before the pipelines. When the pipelines result in a SKIP, then the main pipelines will be skipped, but children will be executed. A new exception SkipExecutionPipelineStepException and step throwSkipExecutionException have been created to make it easier to control this behavior.
  • evaluationPipelineIds - An optional array of pipelines ids to run prior to executing the main pipelines. This is an alternate to the evaluationPipelines array.

Execution Flow

Execution Flow