qsub到群集时如何解决“ MissingOutputException”?

时间:2019-06-11 08:25:45

标签: snakemake

我用snakemake编写了GATK Muttct2 creatPoN流。然后当qsub到集群时出现了错误MissingOutputException,但是当在本地计算机上运行snakemake流时,一切正常。这个问题让我发疯,感谢您的帮助。

这是岩石群环境,我的着陆节点是centos7,我通过-np snakemake -s s02.py -np运行流,命令似乎还可以。

[Tue Jun 11 10:48:30 2019] rule mutect2_pon1:
    input: call_region/chr15.region.bed, gatk_bqsr/S53.bqsr.bam
    output: gatk_mutect2/S53/chr15.vcf.gz
    log: log/gatk_mutect2/S53/chr15.vcf.gz.log
    jobid: 561
    wildcards: chrid=chr15, sample=S53

/home/my/anaconda2/bin/gatk --java-options "-Xmx12g" Mutect2 -R /database/Human_hg19/genome.fa -I gatk_bqsr/S53.bqsr.bam -tumor S53 --disable-read-filter MateOnSameContigOrNoMappedMateReadFilter -L call_region/chr15.region.bed -O gatk_mutect2/S53/chr15.vcf.gz 1>log/gatk_mutect2/S53/chr15.vcf.gz.log 2>&1

我通过qsub(和登陆代码)测试了与错误有关的通配符和命令,都可以。 qsub命令是snakemake -s s02.py --redo-incomplete --cluster "qsub -cwd -q all.q -l vf=10g,p=10" --jobs 5 --latency-wait 30

我现在四次重复qsub:

  1. 列表项

1。启动时间:在mutctct2_pon1(S53,chr15)中发生错误,输出和日志文件都正常

  1. 列表项
  2. 列表项

第二次:错误发生在extract_tarbed(chr7)中,输出和日志文件都正常

  1. 列表项

第三次:错误发生在Mutct2_pon1(S55,chr1)中,输出和日志文件都正常

  1. 列表项

第四次:在mutctct2_pon1(S07,chr20)中发生错误,输出和日志文件都正常

import os
import re

#software and library relative information
def read_config(cfgfl):
    mydict = {}
    with open(cfgfl,"r") as handle:
        for line in handle:
            if not line or line.startswith("#"):
                continue
            line = line.strip("\n")
            field,value = re.split("\s*=\s*",line)
            mydict[field] = value
    return mydict

mydict = read_config("config.txt")
GATK = mydict["GATK"]
JAVA = mydict["JAVA"]
REFERENCE = mydict["REFERENCE"]
BED = mydict["BED"]

#samples files list
configfile: "paired.yaml"

chrlist1 = [re.split(r"\s+",i)[0] for i in open(BED)]
chrlist = []
[chrlist.append(str(i)) for i in chrlist1 if not i in chrlist]

# #snakemake main program

#Wildcard constraints
samples = [config["samples"][i][2] for i in config["samples"]]

wildcard_constraints:
    chrid="|".join(chrlist),
    sample="|".join(samples)

rule all:
    input:
        expand("call_region/{chrid}.region.bed",chrid=chrlist),
        "gatk_mutect2/normalpon.vcf.gz"

rule extract_chrbed:
    input:
        BED
    output:
        "call_region/{chrid}.region.bed"
    shell:
        "grep -w {wildcards.chrid} {input} > {output}"

rule mutect2_pon1:
    input:
        normal_bam="gatk_bqsr/{sample}.bqsr.bam",
        tarbed="call_region/{chrid}.region.bed"
    output:
        "gatk_mutect2/{sample}/{chrid}.vcf.gz"
    log:
        "log/gatk_mutect2/{sample}/{chrid}.vcf.gz.log"
    params:
        "--disable-read-filter MateOnSameContigOrNoMappedMateReadFilter",
        "--java-options \"-Xmx12g\""
    shell:
        "{GATK} {params[1]} Mutect2 -R {REFERENCE} "
        "-I {input.normal_bam} -tumor {wildcards.sample} "
        "{params[0]} -L {input.tarbed} -O {output} "
        "1>{log} 2>&1"

rule mutect2_pon2:
    input:
        expand("gatk_mutect2/{{sample}}/{chrid}.vcf.gz",chrid=chrlist)
    output:
        "gatk_mutect2/{sample}.vcf.gz"
    log:
        "log/gatk_mutect2/{sample}.vcf.gz.log"
    run:
        inputfmt = list(map("-I {}".format,input))
        shell("{GATK} MergeVcfs {inputfmt} -O {output} >{log} 2>&1")

rule mutect2_pon3:
    input:
        expand("gatk_mutect2/{sample}.vcf.gz",sample=samples)
    output:
        "gatk_mutect2/normalpon.vcf.gz"
    log:
        "log/gatk_mutect2/ponlib.vcf.log"
    run:
        inputfmt = list(map("-vcfs {}".format,input))
        shell("{GATK} CreateSomaticPanelOfNormals {inputfmt} -O {output} >{log} 2>&1")
  #      

-------------- config.txt和paired.yaml

     

>猫config.txt

GATK      = /my/dir/GATK4  
JAVA      = /my/dir/JAVA8  
REFERENCE = /my/dir/hg19.fa  
BED       = /my/dir/target.bed
  

> paired.yaml

     

肿瘤配对样本,paired_id = [tumorID,tumor_bam,normalID,normal_bam]

samples:
    S01: ["S01","gatk_bqsr/S01.bqsr.bam","S02","gatk_bqsr/S02.bqsr.bam"]
    S03: ["S03","gatk_bqsr/S03.bqsr.bam","S04","gatk_bqsr/S04.bqsr.bam"]
    S59: ["S59","gatk_bqsr/S59.bqsr.bam","S60","gatk_bqsr/S60.bqsr.bam"]


[Mon Jun 10 21:40:02 2019]

    rule mutect2_pon1:
        input: call_region/chr15.region.bed, gatk_bqsr/S53.bqsr.bam
        output: gatk_mutect2/S53/chr15.vcf.gz
        log: log/gatk_mutect2/S53/chr15.vcf.gz.log
        jobid: 460
        wildcards: sample=S53, chrid=chr15

[Mon Jun 10 21:40:02 2019]
Finished job 218.
195 of 726 steps (27%) done
Submitted job 460 with external jobid 'Your job 3628 ("snakejob.mutect2_pon1.460.sh") has been submitted'.
Waiting at most 30 seconds for missing files.
MissingOutputException in line 67 of s02.py:
Missing files after 30 seconds:

This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait.

0 个答案:

没有答案