Snakemake与wget

时间:2018-02-16 00:35:08

标签: python wget snakemake

我尝试使用snakemake从网站下载文件,然后将它们拼凑在一起。

但是,我总是收到错误:Waiting at most 5 seconds for missing files. MissingOutputException in line 24 of /path/to/Snakefile:

为什么在尝试继续之前,snakemake只是等待文件下载?将所有读取放在不同目录中会很不方便,而且我不想打扰配置文件,因为这是一次性的Snakefile

谢谢!

这是我的剧本:

import os
rule all:
    input:
        "ONT/yeastONT_combined.fastq.gz",
        "trimmed/ERR1938684_1.trim.final.fastq.gz",
        "trimmed/ERR1938684_2.trim.final.fastq.gz",
        "trimmed/ERR1938684_1.trim.unpaired.fastq.gz",
        "trimmed/ERR1938684_2.trim.unpaired.fastq.gz"

rule getONTfwd:
    input:

    output:
        "ONT/ERR1883385_1.fastq.gz",
        "ONT/ERR1883386_1.fastq.gz",
        "ONT/ERR1883387_1.fastq.gz",
        "ONT/ERR1883393_1.fastq.gz",
        "ONT/ERR1883395_1.fastq.gz",
        "ONT/ERR1883396_1.fastq.gz"

    shell:
        """cd ONT \
        wget 'ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR188/005/ERR1883385/ERR1883385_1.fastq.gz' \
        wget 'ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR188/006/ERR1883386/ERR1883386_1.fastq.gz' \
        wget 'ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR188/007/ERR1883387/ERR1883387_1.fastq.gz' \
        wget 'ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR188/003/ERR1883393/ERR1883393_1.fastq.gz' \
        wget 'ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR188/005/ERR1883395/ERR1883395_1.fastq.gz' \
        wget 'ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR188/006/ERR1883396/ERR1883396_1.fastq.gz' \
        sleep 300 \
        cd .."""

rule combine_ONT:
    input:
        f1 = "ONT/ERR1883385_1.fastq.gz",
        f2 = "ONT/ERR1883386_1.fastq.gz",
        f3 = "ONT/ERR1883387_1.fastq.gz",
        f4 = "ONT/ERR1883393_1.fastq.gz",
        f5 = "ONT/ERR1883395_1.fastq.gz",
        f6 = "ONT/ERR1883396_1.fastq.gz"
    output:
        "ONT/yeastONT_combined.fastq.gz"
    shell:
        """cat {input.f1} {input.f2} {input.f3} {input.f4} {input.f5} {input.f6} > {output}"""

1 个答案:

答案 0 :(得分:1)

规则getONTfwd的shell命令中存在语法错误,您可以使用\转义每个换行符;这导致完整的shell命令被视为一个单独的命令。删除转义字符\或在转义符号前添加分号以分隔命令(即; \

此外,如果您仅使用sleep 300来提供缓冲时间来下载所有文件,则不需要wget。正如Johannes的评论中提到的,trimmed/*.fastq.gz退出只下载了来自url的文件。并且,示例脚本中缺少文件import os rule all: input: "ONT/yeastONT_combined.fastq.gz", # "trimmed/ERR1938684_1.trim.final.fastq.gz", # "trimmed/ERR1938684_2.trim.final.fastq.gz", # "trimmed/ERR1938684_1.trim.unpaired.fastq.gz", # "trimmed/ERR1938684_2.trim.unpaired.fastq.gz" rule getONTfwd: output: "ONT/ERR1883385_1.fastq.gz", "ONT/ERR1883386_1.fastq.gz", "ONT/ERR1883387_1.fastq.gz", "ONT/ERR1883393_1.fastq.gz", "ONT/ERR1883395_1.fastq.gz", "ONT/ERR1883396_1.fastq.gz" shell: """cd ONT wget 'ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR188/005/ERR1883385/ERR1883385_1.fastq.gz' wget 'ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR188/006/ERR1883386/ERR1883386_1.fastq.gz' wget 'ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR188/007/ERR1883387/ERR1883387_1.fastq.gz' wget 'ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR188/003/ERR1883393/ERR1883393_1.fastq.gz' wget 'ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR188/005/ERR1883395/ERR1883395_1.fastq.gz' wget 'ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR188/006/ERR1883396/ERR1883396_1.fastq.gz' cd ..""" rule combine_ONT: input: f1 = "ONT/ERR1883385_1.fastq.gz", f2 = "ONT/ERR1883386_1.fastq.gz", f3 = "ONT/ERR1883387_1.fastq.gz", f4 = "ONT/ERR1883393_1.fastq.gz", f5 = "ONT/ERR1883395_1.fastq.gz", f6 = "ONT/ERR1883396_1.fastq.gz" output: "ONT/yeastONT_combined.fastq.gz" shell: """cat {input.f1} {input.f2} {input.f3} {input.f4} {input.f5} {input.f6} > {output}""" 的规则。

以下是您的示例的已编辑版本,该版本应按预期工作:

=IF(ISERROR(SEARCH("doctor|Fysio|Admin";A2));0;2.3)