Question

我正在尝试使用PBS Arrays在不同文件上使用相同的程序并行提交5个作业。 PBS将启动脚本的五个不同副本，每个副本在PBS_ARRAYID变量中具有不同的整数。该脚本将使用：qsub script.pbs

运行

我目前的代码如下;虽然它按原样工作，但它在每个批处理过程中多次计算文件列表。有没有更有效的方法来做到这一点？

#PBS -S /bin/bash
#PBS -t 1-5       #Makes the $PBS_ARRAYID have the integer values 1-5
#PBS -V

workdir="/user/test"

samtools sort` `find ${workdir}/*.bam | sed ${PBS_ARRAYID}'!d'` > `find ${workdir}/*.bam | sed ${PBS_ARRAYID}'!d' | sed "s/.bam/.sorted.bam/"`

Answer 1

#PBS -S /bin/bash
#PBS -t 0-4       #Makes the $PBS_ARRAYID have the integer values 0-4
#PBS -V

workdir="/user/test"

files=( "$workdir"/*.bam )       # Expand the glob, store it in an array
infile="${files[$PBS_ARRAYID]}"  # Pick one item from that array

exec samtools sort "$infile" >"${infile%.bam}.sorted.bam"

注意：

files=( "$workdir"/*.bam )执行bash内部的glob（不需要ls）并将该glob的结果存储在一个数组中以供重用。
数组是零索引的;因此，我们使用的是0-4而不是1-5。
使用命令替换 - `...`或$(...) - 会产生很大的性能开销，最好避免使用。
使用exec作为脚本中的最后一个命令告诉shell解释器它可以用该命令替换自己，而不是需要保留在内存中。

使用Portable Batch System（PBS）阵列同时处理不同的文件

1 个答案: