Computational workflows and pipelines are essential in large-scale data analysis and bioinformatics. Here, we present Pipelines.jl and JobSchedulers.jl, as the reproducible and scalable pipeline builder and workload manager. Pipelines.jl provides simple but powerful methods to wrap Julia function or external command, and provide data and dependency validation, and job re-try or skipping. JobSchedulers.jl can allocate CPU and memory to jobs and defer a job until other jobs are finished.
Computational workflows and pipelines are essential in large-scale data analysis and bioinformatics. Julia is a fast and easy-to-learn language, powered with parallel implementation and suited to glue together multiple languages and data types. Those features support Julia to be a potential workflow language. Here, we present two packages, Pipelines.jl and JobSchedulers.jl, as the Julia-based pipeline builder and workload manager. Pipelines.jl is a lightweight and powerful package for building reusable pipelines. JobSchedulers.jl was inspired by Slurm and PBS, and supports allocating CPU and memory to a specific job and deferring a job until other jobs are finished. Julia or external code can wrap in a Program type, and Pipelines.jl provides the code with multiple features, including inputs and outputs validation, dependency check, resuming interrupted tasks, re-trying failed tasks, and skipping finished tasks. Pipelines.jl and JobSchedulers.jl have been used in Clasnip (www.clasnip.com), a web-based microorganism classification service with Julia back-end. In addition, BioPipelines.jl is an implementation of Pipelines.jl. It integrates a collection of bioinformatics programs and is fully compatible with PackageCompiler.jl. It solves the relocation of Julia applications and the configuration of external dependencies used in workflows. Therefore, by combining JobSchedulers.jl and Pipelines.jl, developers and researchers can conveniently build reproducible and scalable workflows in Julia.