如何防止snakemake从失败的作业中删除输出文件夹?

时间:2019-03-29 14:28:37

标签: snakemake

我有一个遍历文件的规则,可以拉出Fastq文件路径并在Fastq文件上运行trimGalore。但是,某些文件已损坏/被截断,因此trimGalore无法处理它们。它会继续在剩余文件上运行,但总体规则将失败,并且也会删除包含成功处理的文件的输出文件夹。如何保留输出文件夹?

我尝试更改shell命令以忽略退出状态,但snakemake似乎在运行的shell元素内强制执行set -euo pipefail

rule trimGalore:
    """
    This module takes in the temporary file created by parse sampleFile rule and determines if libraries are single end or paired end.
    The appropriate step for trimGalore is then ran and a summary of the runs is produced in summary_tg.txt
    """
    input:
        rules.parse_sampleFile.output[1]+"singleFile.txt", rules.parse_sampleFile.output[1]+"pairFile.txt"
    output:
        directory(projectDir+"/trimmed_reads/")
    log:
        projectDir+"/logs/"+stamp+"_trimGalore.log"
    params:
        p = trimGaloreParams
    shell:
        """
        (awk -F "," '{{print $2}}' {input[0]} |while read i; do echo $(date +"%Y-%m-%d %H:%M:%S") >>{log}; echo "$USER">>{log}; trim_galore {params.p} --gzip -o {output} $i; done
        awk -F "," '{{print $2" "$3}}' {input[1]} |while read i; do echo $(date +"%Y-%m-%d %H:%M:%S") >>{log}; echo "$USER">>{log}; trim_galore --paired {params.p} --gzip -o {output} $i; done) 2>>{log}
        """

我很高兴在失败的情况下继续处理剩余的Fastq文件,但我希望在作业完成并失败时保留规则输出文件夹。我要继续处理未截断的文件

2 个答案:

答案 0 :(得分:0)

当前,您的规则将整个目录视为输出目录,因此,如果在此过程中弹出任何错误,它将把整个作业视为失败并丢弃输出(即整个文件夹)。

我能想到的解决方案与this section of the Snakemake docs有关,而在函数作为输入的正下方。

def myfunc(wildcards):
    return [... a list of input files depending on given wildcards ...]

rule:
    input: myfunc
    output: "someoutput.{somewildcard}.txt"
    shell: "..."

这样,您可以尝试遍历文件,snakemake将为每个Fastq创建一个作业,因此,如果单个作业失败,则只会删除该输出文件。

免责声明:这是我刚刚学到的,尚未尝试过,但这对我也很有用!

答案 1 :(得分:0)

我遇到了类似问题,我的方法是为输出创建一个虚拟文件,然后将我/您的输出移至params。

rule trimGalore:
    """
    This module takes in the temporary file created by parse sampleFile rule and determines if libraries are single end or paired end.
    The appropriate step for trimGalore is then ran and a summary of the runs is produced in summary_tg.txt
    """
    input:
        rules.parse_sampleFile.output[1]+"singleFile.txt", rules.parse_sampleFile.output[1]+"pairFile.txt"
    output:
        dummy = dummy.txt,
    log:
        projectDir+"/logs/"+stamp+"_trimGalore.log"
    params:
        p = trimGaloreParams,
        dir = directory(projectDir+"/trimmed_reads/")
    shell:
        """
        (awk -F "," '{{print $2}}' {input[0]} |while read i; do echo $(date +"%Y-%m-%d %H:%M:%S") >>{log}; echo "$USER">>{log}; trim_galore {params.p} --gzip -o {params.dir} $i; done
        awk -F "," '{{print $2" "$3}}' {input[1]} |while read i; do echo $(date +"%Y-%m-%d %H:%M:%S") >>{log}; echo "$USER">>{log}; trim_galore --paired {params.p} --gzip -o {params.dir} $i; done) 2>>{log}  && touch {output.dummy}
        """

我无法对此进行测试,您可能需要稍作修改...它可能会取得成果。