snakemake:管道失败,缺少MissingOutputException

时间:2019-11-06 13:24:58

标签: bioinformatics snakemake

为清楚起见,我必须修改帖子。情况是,一开始我在本地计算机上很好地运行了管道,但是在提交给集群时却失败了。发布问题后,我发现snakemake的版本是3.13.3,所以我更新到v5.7.3,然后我发现它在本地计算机和群集上均失败。因此,我现在正在努力弄清我的Snakefile或其他任何问题。 错误消息:

Waiting at most 5 seconds for missing files.
MissingOutputException in line 24 of /work/path/rna_seq_pipeline/Snakefile:
Missing files after 5 seconds:
bam/A2_Aligned.toTranscriptome.out.bam
This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait.
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /work/path/rna_seq_pipeline/.snakemake/log/2019-11-07T153434.327966.snakemake.log

所以我的snakamake文件可能有问题。这是我的Snakefile

# config file 
configfile: "config.yaml"

shell.prefix("source ~/.bash_profile")

# determine which genome reference you would like to use
# here we are using GRCm38
# depending on the freeze, the appropriate references and data files will be chosen from the config
freeze = config['freeze']

# read list of samples, one per line
with open(config['samples']) as f:
    SAMPLES = f.read().splitlines()

rule all:
    input:
        starindex = config['reference']['stargenomedir'][freeze] + "/" + "SAindex",
        rsemindex = config['reference']['rsemgenomedir'][freeze] + ".n2g.idx.fa",
        fastqs = expand("fastq/{file}_{rep}_paired.fq.gz", file = SAMPLES, rep = ['1','2']),
        bams = expand("bam/{file}_Aligned.toTranscriptome.out.bam", file = SAMPLES),
        quant = expand("quant/{file}.genes.results", file = SAMPLES)

# align using STAR
rule star_align:
    input:
        f1 = "fastq/" + "{file}_1_paired.fq.gz",
        f2 = "fastq/" + "{file}_2_paired.fq.gz"
    output:
        out = "bam/" + "{file}_Aligned.toTranscriptome.out.bam"
    params:
        star = config['tools']['star'],
        genomedir = config['reference']['stargenomedir'][freeze],
        prefix = "bam/" + "{file}_"
    threads: 12
    shell:  
        """
        {params.star} \
        --runThreadN {threads} \
        --genomeDir {params.genomedir} \
        --readFilesIn {input.f1} {input.f2} \
        --readFilesCommand zcat \
        --outFileNamePrefix {params.prefix} \
        --outSAMtype BAM SortedByCoordinate \
        --outSAMunmapped Within \
        --quantMode TranscriptomeSAM \
        --outSAMattributes NH HI AS NM MD \
        --outFilterType BySJout \
        --outFilterMultimapNmax 20 \
        --outFilterMismatchNmax 999 \
        --outFilterMismatchNoverReadLmax 0.04 \
        --alignIntronMin 20 \
        --alignIntronMax 1000000 \
        --alignMatesGapMax 1000000 \
        --alignSJoverhangMin 8 \
        --alignSJDBoverhangMin 1 \
        --sjdbScore 1 \
        --limitBAMsortRAM 50000000000
        """

# quantify expression using RSEM
rule rsem_quant:
    input:
        bam = "bam/" + "{file}_Aligned.toTranscriptome.out.bam"
    output:
        quant = "quant/" + "{file}.genes.results"
    params:
        calcexp = config['tools']['rsem']['calcexp'],
        genomedir = config['reference']['rsemgenomedir'][freeze],
        prefix =  "quant/" + "{file}"
    threads: 12
    shell:
        """
        {params.calcexp} \
        --paired-end \
        --no-bam-output \
        --quiet \
        --no-qualities \
        -p {threads} \
        --forward-prob 0.5 \
        --seed-length 21 \
        --fragment-length-mean -1.0 \
        --bam {input.bam} {params.genomedir} {params.prefix}

还有我的config.yaml

freeze: grcm38

# samples file
samples:
    samples.txt

# software, binaries or tools
tools:
    fastqdump: fastq-dump
star: STAR
rsem: 
    calcexp: rsem-calculate-expression
    prepref: rsem-prepare-reference

# reference files, genome indices and data
reference:
    stargenomedir: 
        grch38: /work/path/reference/STAR/GRCh38
        grcm38: /work/path/reference/STAR/GRCm38
    rsemgenomedir: 
        grch38: /work/path/reference/RSEM/GRCh38/GRCh38
        grcm38: /work/path/reference/RSEM/GRCm38/GRCm38
    fasta: 
        grch38: /work/path/GRCh38/Homo_sapiens.GRCh38.dna.primary_assembly.fa
        grcm38: /work/path/reference/GRCm38/Mus_musculus.GRCm38.dna.primary_assembly.fa
    gtf: 
        grch38: /work/path/reference/GRCh38/Homo_sapiens.GRCh38.98.gtf
        grcm38: /work/path/reference/GRCm38/Mus_musculus.GRCm38.98.gtf

最后,samples.txt

A1
A2

有什么建议吗?

ps:改编自管道https://github.com/komalsrathi/rnaseq-star-rsem-pipeline/blob/master/Snakefile

2 个答案:

答案 0 :(得分:0)

触摸碰到cannot touch: : No such file or directory时,通常意味着目录结构不存在。如果您尝试怎么办:

touch /work/path/rna_seq_pipeline/.snakemake/tmp.o_2ffebs/1.jobfailed 

缺少文件行表明您正在尝试将输出存储在.snakemake文件夹中。真的吗?如果将其移至例如当前工作目录,会发生什么?

答案 1 :(得分:0)

有关Biostars的同一帖子,并在此处回答。 https://www.biostars.org/p/406693/#406907