metalus

This project aims to make writing Spark applications easier by abstracting the effort to assemble the driver into reusable steps and pipelines.

View project on GitHub

Documentation Home

Pipeline Context

When a pipeline is executed state is maintained using the PipelineContext class. The PipelineContext is a shared object that contains the current state of the pipeline execution. Each step that executed may access this context simply by adding a parameter to the step function. The presence of this parameter informs Metalus that the step would like the context injected.

The PipelineContext contains several fields that may be used by a step to access the current state of the pipeline execution:

  • sparkConf - The Spark configuration.
  • sparkSession - The current Spark session that the application is executing within.
  • globals - Values that are available within the actual execution running thi current pipeline and step.
  • security - A PipelineSecurityManager that is used to secure values that are being mapped.
  • parameters - The pipeline parameters being used. Contains initial parameters as well as the result of steps that have been processed.
  • stepPackages - A list of packages where Metalus will look for steps being executed.
  • parameterMapper - The PipelineStepMapper being used to map parameters to pipeline steps.
  • pipelineListener - The PipelineListener registered with the execution.
  • stepMessages - A Spark accumulator used for registering message for code that executes remotely.
  • rootAudit - The base ExecutionAudit used for tracking pipeline and step audits.
  • pipelineManager - The PipelineManager used to load pipelines during the exection and for step groups.
  • json4sFormats - The json4s Formats used when deserializing globals.