metalus

This project aims to make writing Spark applications easier by abstracting the effort to assemble the driver into reusable steps and pipelines.

View project on GitHub

Documentation Home

Step Libraries

An application developer requires two things to be able to construct Spark applications:

  • Metalus Application - A self contained jar containing metlaus-core components.
  • Step Library - One or more jars containing step functions.

A step library consists of three main components:

Step Library Structure

Step Classes

The step library should will contain the steps which are functions of scala objects. Each function should be annotated indicating which functions should be made available for pipelines. Running the metadata extractor against a step library will scan the annotated objects and functions to generate the step templates used by developers to create new JSON based pipelines.

Extensions

A step library may contain additional classes that extend the base Metalus functionality. This topic will be covered in other documentation.

Pipelines

JSON based pipelines may also be included within a step library. The default PipelineManager will attempt to locate pipelines within the application configuration file and if not found, will scan the step libraries metadata/pipelines path looking for the pipeline. The pipeline must be in a JSON file with the following naming convention: <pipeline.id>.json

Dependencies

When preparing to use a step library, it is useful to know exactly what dependencies will be needed in addition to the library jar. A dependencies.json file may be included that contains information tht may be used to resolve dependent jars at run time. More information may be found here.