我用snakemake编写了GATK Muttct2 creatPoN流。然后当qsub到集群时出现了错误MissingOutputException
,但是当在本地计算机上运行snakemake流时,一切正常。这个问题让我发疯,感谢您的帮助。
这是岩石群环境,我的着陆节点是centos7,我通过-np snakemake -s s02.py -np
运行流,命令似乎还可以。
[Tue Jun 11 10:48:30 2019] rule mutect2_pon1:
input: call_region/chr15.region.bed, gatk_bqsr/S53.bqsr.bam
output: gatk_mutect2/S53/chr15.vcf.gz
log: log/gatk_mutect2/S53/chr15.vcf.gz.log
jobid: 561
wildcards: chrid=chr15, sample=S53
/home/my/anaconda2/bin/gatk --java-options "-Xmx12g" Mutect2 -R /database/Human_hg19/genome.fa -I gatk_bqsr/S53.bqsr.bam -tumor S53 --disable-read-filter MateOnSameContigOrNoMappedMateReadFilter -L call_region/chr15.region.bed -O gatk_mutect2/S53/chr15.vcf.gz 1>log/gatk_mutect2/S53/chr15.vcf.gz.log 2>&1
我通过qsub(和登陆代码)测试了与错误有关的通配符和命令,都可以。
qsub命令是snakemake -s s02.py --redo-incomplete --cluster "qsub -cwd -q all.q -l vf=10g,p=10" --jobs 5 --latency-wait 30
我现在四次重复qsub:
1。启动时间:在mutctct2_pon1(S53,chr15)中发生错误,输出和日志文件都正常
第二次:错误发生在extract_tarbed(chr7)中,输出和日志文件都正常
第三次:错误发生在Mutct2_pon1(S55,chr1)中,输出和日志文件都正常
第四次:在mutctct2_pon1(S07,chr20)中发生错误,输出和日志文件都正常
import os
import re
#software and library relative information
def read_config(cfgfl):
mydict = {}
with open(cfgfl,"r") as handle:
for line in handle:
if not line or line.startswith("#"):
continue
line = line.strip("\n")
field,value = re.split("\s*=\s*",line)
mydict[field] = value
return mydict
mydict = read_config("config.txt")
GATK = mydict["GATK"]
JAVA = mydict["JAVA"]
REFERENCE = mydict["REFERENCE"]
BED = mydict["BED"]
#samples files list
configfile: "paired.yaml"
chrlist1 = [re.split(r"\s+",i)[0] for i in open(BED)]
chrlist = []
[chrlist.append(str(i)) for i in chrlist1 if not i in chrlist]
# #snakemake main program
#Wildcard constraints
samples = [config["samples"][i][2] for i in config["samples"]]
wildcard_constraints:
chrid="|".join(chrlist),
sample="|".join(samples)
rule all:
input:
expand("call_region/{chrid}.region.bed",chrid=chrlist),
"gatk_mutect2/normalpon.vcf.gz"
rule extract_chrbed:
input:
BED
output:
"call_region/{chrid}.region.bed"
shell:
"grep -w {wildcards.chrid} {input} > {output}"
rule mutect2_pon1:
input:
normal_bam="gatk_bqsr/{sample}.bqsr.bam",
tarbed="call_region/{chrid}.region.bed"
output:
"gatk_mutect2/{sample}/{chrid}.vcf.gz"
log:
"log/gatk_mutect2/{sample}/{chrid}.vcf.gz.log"
params:
"--disable-read-filter MateOnSameContigOrNoMappedMateReadFilter",
"--java-options \"-Xmx12g\""
shell:
"{GATK} {params[1]} Mutect2 -R {REFERENCE} "
"-I {input.normal_bam} -tumor {wildcards.sample} "
"{params[0]} -L {input.tarbed} -O {output} "
"1>{log} 2>&1"
rule mutect2_pon2:
input:
expand("gatk_mutect2/{{sample}}/{chrid}.vcf.gz",chrid=chrlist)
output:
"gatk_mutect2/{sample}.vcf.gz"
log:
"log/gatk_mutect2/{sample}.vcf.gz.log"
run:
inputfmt = list(map("-I {}".format,input))
shell("{GATK} MergeVcfs {inputfmt} -O {output} >{log} 2>&1")
rule mutect2_pon3:
input:
expand("gatk_mutect2/{sample}.vcf.gz",sample=samples)
output:
"gatk_mutect2/normalpon.vcf.gz"
log:
"log/gatk_mutect2/ponlib.vcf.log"
run:
inputfmt = list(map("-vcfs {}".format,input))
shell("{GATK} CreateSomaticPanelOfNormals {inputfmt} -O {output} >{log} 2>&1")
#-------------- config.txt和paired.yaml
>猫config.txt
GATK = /my/dir/GATK4
JAVA = /my/dir/JAVA8
REFERENCE = /my/dir/hg19.fa
BED = /my/dir/target.bed
> paired.yaml
肿瘤配对样本,paired_id = [tumorID,tumor_bam,normalID,normal_bam]
samples:
S01: ["S01","gatk_bqsr/S01.bqsr.bam","S02","gatk_bqsr/S02.bqsr.bam"]
S03: ["S03","gatk_bqsr/S03.bqsr.bam","S04","gatk_bqsr/S04.bqsr.bam"]
S59: ["S59","gatk_bqsr/S59.bqsr.bam","S60","gatk_bqsr/S60.bqsr.bam"]
[Mon Jun 10 21:40:02 2019]
rule mutect2_pon1:
input: call_region/chr15.region.bed, gatk_bqsr/S53.bqsr.bam
output: gatk_mutect2/S53/chr15.vcf.gz
log: log/gatk_mutect2/S53/chr15.vcf.gz.log
jobid: 460
wildcards: sample=S53, chrid=chr15
[Mon Jun 10 21:40:02 2019]
Finished job 218.
195 of 726 steps (27%) done
Submitted job 460 with external jobid 'Your job 3628 ("snakejob.mutect2_pon1.460.sh") has been submitted'.
Waiting at most 30 seconds for missing files.
MissingOutputException in line 67 of s02.py:
Missing files after 30 seconds:
This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait.