Introduction
The Metalus library was started as a way to build Spark applications at runtime without the need to write or compile code. The library is written in Scala and provides binaries for different versions of Spark and Scala. Developers build applications by providing a JSON configuration file which gets loaded and executed by the metalus core library.
Application Configuration
There are several methods for starting a metalus application, but the easiest to use is the application framework using an application configuration. The application configuration JSON file provides information needed to run the application. Any application parameters provided to the spark-submit command will automatically be added to the globals and made available at runtime.
Step Libraries
Step libraries contain the scala functions, pipeline configurations and custom driver classes which may be called at runtime using the provided configuration.
Steps
Steps are scala object functions that are executed at runtime by metalus cores. Each function should be a small reusable unit of work that stand alone and define requirements through parameters. Developers may annotate the functions or separately publish step templates which can be used when building applications.
Pipelines
Pipeline configurations written in JSON may be included in a step library. Developers will need to provide a directory in the jar using the path /metadata/pipelines. The pipeline will need to be a JSON file where the pipeline id is the name plus the .json extension.
Custom Classes
Developers have the option to extend provided functionality such as drivers, driver setup, file managers and other classes. These can be added to a step library and made available at runtime.
Self Contained Jar
Metalus provides a self contained jar that may be used when calling the spark-submit command. This jar contains no step libraries which provides the developer control over what is available at runtime using the –jars parameter.
Versions
Metalus Core
Metalus core provides the base library required to run metalus applications and build new step libraries.
Spark Version | Scala Version | Library |
---|---|---|
2.4 | 2.11 | Scala 2.11 Spark 2.4 library |
2.4 | 2.12 | Scala 2.12 Spark 2.4 library |
3.0 | 2.12 | Scala 2.12 Spark 3.0 library |
3.1 | 2.12 | Scala 2.12 Spark 3.1 library |
Metalus Common
Metalus common provides a step library for building basic applications.
Spark Version | Scala Version | Library |
---|---|---|
2.4 | 2.11 | Scala 2.11 Spark 2.4 library |
2.4 | 2.12 | Scala 2.12 Spark 2.4 library |
3.0 | 2.12 | Scala 2.12 Spark 3.0 library |
3.1 | 2.12 | Scala 2.12 Spark 3.1 library |
Metalus AWS
This step library contains AWS specific components. The Kinesis driver provides a basic implementation that gathers data and then initiates the Metalus Pipeline Core for processing of the incoming data.
Spark Version | Scala Version | Library |
---|---|---|
2.4 | 2.11 | Scala 2.11 Spark 2.4 library |
2.4 | 2.12 | Scala 2.12 Spark 2.4 library |
3.0 | 2.12 | Scala 2.12 Spark 3.0 library |
3.1 | 2.12 | Scala 2.12 Spark 3.1 library |
Metalus GCP
This step library contains GCP specific components. The Pub/Sub driver provides a basic implementation that gathers data and then initiates the Metalus Pipeline Core for processing of the incoming data.
Spark Version | Scala Version | Library |
---|---|---|
2.4 | 2.11 | Scala 2.11 Spark 2.4 library |
2.4 | 2.12 | Scala 2.12 Spark 2.4 library |
3.0 | 2.12 | Scala 2.12 Spark 3.0 library |
3.1 | 2.12 | Scala 2.12 Spark 3.1 library |
Metalus Kafka
This step library contains Kafka specific components. The Kafka driver provides a basic implementation that gathers data and then initiates the Metalus Pipeline Core for processing of the incoming data.
Spark Version | Scala Version | Library |
---|---|---|
2.4 | 2.11 | Scala 2.11 Spark 2.4 library |
2.4 | 2.12 | Scala 2.12 Spark 2.4 library |
3.0 | 2.12 | Scala 2.12 Spark 3.0 library |
3.1 | 2.12 | Scala 2.12 Spark 3.1 library |
Metalus Mongo
Metalus Mongo provides a step library for working with the Mongo Data store.
Spark Version | Scala Version | Library |
---|---|---|
2.4 | 2.11 | Scala 2.11 Spark 2.4 library |
2.4 | 2.12 | Scala 2.12 Spark 2.4 library |
3.0 | 2.12 | Scala 2.12 Spark 3.0 library |
3.1 | 2.12 | Scala 2.12 Spark 3.1 library |
Metalus Utilities
This project provides utilities that help work with the project.
Spark Version | Scala Version | Library |
---|---|---|
2.4 | 2.11 | Scala 2.11 Spark 2.4 library |
2.4 | 2.12 | Scala 2.12 Spark 2.4 library |
3.0 | 2.12 | Scala 2.12 Spark 3.0 library |
3.1 | 2.12 | Scala 2.12 Spark 3.1 library |