metalus

This project aims to make writing Spark applications easier by abstracting the effort to assemble the driver into reusable steps and pipelines.

View project on GitHub

Documentation Home

Pipelines

Metalus pipelines represent a unit of work within a Metalus execution. Pipelines may run by themselves or run chained together with other pipelines within an execution. The basic structure of a pipeline consists of an id, name, category and a list of steps to execute.

  • id - A unique id for the pipeline. A GUID is recommended.
  • name - A displayable name that will be used when logging.
  • category - The type of pipeline:
    • pipeline - This is the base pipeline that gets executed.
    • step-group - This designates that the pipeline is designed to be included within a Step Group.
  • steps - A list of pipeline steps to be executed.
  • tags - An optional list of strings to apply to the pipeline.

The steps list consists of the pipeline steps which will be executed as part of the pipeline. Below is an overview of a basic pipeline:

Pipeline Overview

Pipeline Execution Flow

Pipeline Flow