metalus

This project aims to make writing Spark applications easier by abstracting the effort to assemble the driver into reusable steps and pipelines.

View project on GitHub

Documentation Home

S3 to DataFrame

This step group pipeline will load data from an S3 location into a DataFrame. This step group is designed to work with the LoadToParquet pipeline.

General Information

Id: ff30fd80-c39b-11eb-b944-4f8822efc9f5

Name: LoadS3Data

Required Parameters

Required parameters indicated with a *:

credentialName * - The name of the Secrets Manager key containing the key/secret to use when writing data.
landing_path * - The HDFS path where the file should be landed
fileId * - The unique id for the file that is processed.
readOptions * - The read options to use when loading this file.