metalus

This project aims to make writing Spark applications easier by abstracting the effort to assemble the driver into reusable steps and pipelines.

View project on GitHub

Documentation Home

Metalus Core

The Metalus Core project provides the base functionality needed to build Spark applications. The core library provides the building blocks required to create applications, step libraries and pipelines. It is advised that developers use the application framework and provided step libraries when building Spark applications, however several extensions points have been provided to allow developers the ability to construct custom step libraries and applications:

  • Pipeline Driver - This is the entry point into a Metalus application. A default implementation is provided and should meet all of the batch needs.
  • Driver Setup - This is called by the Pipeline Driver to build out the execution plan which will be executed. This is the most common extension when providing custom behaviors not provided by the application framework.
  • File Manager - The FileManager class provides access to the underlying file system from steps. Implementations for local (used for development) and HDFS are provided by the core library.

There are several utility classes provided to make working with Metalus easier:

  • Driver Utils - Provides functions for building Spark Conf objects, extracting command line parameters, validating required parameters and loading/parsing JSON into executable objects.
  • Execution Audits - Provides a way to audit executions at runtime.
  • Http Rest Client - Provides a basic rest client for communicating with rest APIs. Custom Authorization implementations may be provided.

Step Classes