metalus

This project aims to make writing Spark applications easier by abstracting the effort to assemble the driver into reusable steps and pipelines.

View project on GitHub
Documentation Home AWS Home

S3 Load to Bronze

This execution template will load data from an SFTP location into a DataFrame using the DownloadSFTPToHDFSWithDataFrame, call the LoadToParquet pipeline, and use the WriteDataFrameToHDFS pipeline to store the data in the bronze zone (HDFS) as parquet.

General Information

Id: load_data_bronze_sftp_hdfs

Name: Load SFTP to Bronze HDFS

Form

A custom form allows configuring the download and input parameters as well as controlling the pipeline behavior.