metalus

This project aims to make writing Spark applications easier by abstracting the effort to assemble the driver into reusable steps and pipelines.

View project on GitHub
Documentation Home Common Home

Download File and Store in Bronze Zone

This pipeline will parse incoming data, perform some basic maintenance (standardize column names, adds a record id and the file id to each row) and stores it in a Parquet datastore. The load step is a dynamic step group which uses the pipelineId defined by the global parameter: loadDataFramePipelineId. The write step is a dynamic step group which uses the pipelineId defined by the global parameter: writeDataFramePipelineId. The write step group will be passed a global named inputDataFrame.

General Information

Id: dcff1d10-c2c3-11eb-928b-3dca5c59af1b

Name: DownloadToBronzeHdfs

Required Parameters

Required parameters indicated with a *:

  • loadDataFramePipelineId * - The step group pipeline to use when loading the data into a DataFrame.
  • writeDataFramePipelineId * - The step group pipeline to use when writing the data to the bronze zone.
  • bronzeZonePath * - The HDFS path for the root bronze zone folder.
  • fileId * - The unique id for the file that is processed.