metalus

This project aims to make writing Spark applications easier by abstracting the effort to assemble the driver into reusable steps and pipelines.

View project on GitHub
Documentation Home Common Home

Load to Bronze

This pipeline will load source data to the bronze zone. During the load the pipeline will optionally standardize column names, add a unique record id and a static file id column.

General Information

Id: a9f62840-2827-11ec-9c0c-cbf3549779e5

Name: LoadToBronze

Required Parameters (all parameters should be part of the globals)

  • sourceBronzeConnector - The data connector to use as the source.
  • sourceBronzePath - The path to get the source data.
  • sourceBronzeReadOptions - The options that describe the source data and any settings to use during the read.
  • executeColumnCleanup - Optional boolean indicating that column names should be standardized. Defaults to true.
  • addRecordId - Optional boolean indicating that a unique record id should be added. The uniqueness is only within the current source data and does not consider the destination data. Defaults to true.
  • addFileId - Optional boolean indicating that a static column containing the provided fileId should be added to each record. Defaults to true.
  • destinationBronzeConnector - The data connector to use as the destination.
  • destinationBronzePath - The path to write the data.
  • destinationBronzeWriteOptions - The options to use during the write.

Streaming

This pipeline can be used with streaming connectors. By default, if a streaming connector is provided as the load connector, the job will run indefinitely. The Streaming Query Monitor provides additional options for writing partitioned data.