Question

我有两个bash脚本和一个Rscript，试图通过bash脚本使运行的RNAseq管道并行化。 Rscript基本上读取csv文件的样本名称和元数据信息，并将这些变量作为命令行参数传递给我的bash脚本。

我对运行BioCParallel和BatchJobs感到非常困惑。下面给出的是一个简单示例，但实际上，大约有200个fastq文件。任何帮助表示赞赏。

第一个bash脚本：running.sh

#!/bin/bash

#SBATCH --ntasks-per-node 10
#SBATCH --time=48:00:00
#SBATCH -J preprocessRNA
#SBATCH -n 10
#SBATCH -p long
module load Rstats/3.4.0
R CMD BATCH --no-save --no-restore test.R

Rscript测试。R

#!/usr/bin/env Rscript
library(BiocParallel)
library(BatchJobs)
len <- 3
width <- 5
area <- len * width

cat("Area of a rectangle:\n")
cat("length = ", len, "\n")
cat("width  = ", width, "\n")
cat("area   = ", area, "\n")

funs <- makeClusterFunctionsSLURM("slurm.tmpl")
param <-  BatchJobsParam(4,resources=list(ncpus=2),cluster.functions=funs)
bplapply(system(paste(pipeline.sh, "area", sep="")), funs)

第二个bash脚本：pipeline.sh

#!/bin/bash

test=$1
echo ${test}

slurm脚本：slurm.tmpl

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --time=01:00:00
#SBATCH --job-name=myRtest

module purge
module load R

Rscript test.R

在SLURM群集上使用BioCParallel运行Shell脚本RNAseq管道，同时使用Rscript传递命令行参数

0 个答案: