Snakemake运行规则多次使用先前规则的通配符输出

时间:2018-06-12 04:04:39

标签: snakemake

我有多项研究,我必须为n项研究中的每项研究制作两个文件(.notsad和.txt文件)。创建这些之后,我必须运行一个命令,该命令在每个染色体上运行,并为给定研究中的每个染色体使用相同的两个输入文件(.notsad,.txt)。所以:

mycommand.py study1.notsad study1_filter.txt chr1.bad.gz --out chr1_filter.bad.gz
mycommand.py study1.notsad study1_filter.txt chr2.bad.gz --out chr2_filter.bad.gz
...
mycommand.py study2.notsad study2_filter.txt chr1.bad.gz --out chr1_filter.bad.gz
...

但是我无法让它运行。我得到一个错误:

WildcardError in line 33 of /scripts/Snakefile:
Wildcards in input files cannot be determined from output files:
'ds_lower'

到目前为止我的规则:

import os
import glob

ROOT = "/rootdir/"
ORIGINAL_DATA_FOLDER="original/"
PROCESS_DATA_FOLDER="process/"

ORIGINAL_DATA_SOURCE=ROOT+ORIGINAL_DATA_FOLDER
PROCESS_DATA_SOURCE=ROOT+PROCESS_DATA_FOLDER

DATASETS = [name for name in os.listdir(ORIGINAL_DATA_SOURCE) if os.path.isdir(os.path.join(ORIGINAL_DATA_SOURCE, name))]
LOWERCASE_DATASETS = [dataset.lower() for dataset in DATASETS]
CHROMOSOME = list(range(1,23))

rule all:
    input:
        expand(PROCESS_DATA_SOURCE+"{ds}/chr{chr}_filtered.gen.gz", ds=DATASETS, chr=CHROMOSOME)

rule run_command:
    input:
        ORIGINAL_DATA_SOURCE+"{ds}/chr{chr}.bad.gz", # Matches 22 chroms
        PROCESS_DATA_SOURCE+"{ds}/{ds_lower}_filter.txt", # But this should be common to all chr runs for this study.
        PROCESS_DATA_SOURCE+"{ds}/{ds_lower}.notsad" # This one as well.
    output:
        PROCESS_DATA_SOURCE+"{ds}/chr{chr}_filtered.gen.gz"
    shell:
        # Run command that uses each of the previous files and runs per chromosome
        "mycommand.py {input.2} {input.1} {input.0} --out {output}"

rule write_txt_file:
    input:
        ORIGINAL_DATA_SOURCE+"{ds}/{ds_lower}_info.txt"
    output:
        PROCESS_DATA_SOURCE+"{ds}/{ds_lower}_filter.txt"
    shell:
        "touch {output}"

rule write_notsad_file:
    input:
        ORIGINAL_DATA_SOURCE+"{ds}/_{ds_lower}.sad"
    output:
        PROCESS_DATA_SOURCE+"{ds}/{ds_lower}.notsad"
    shell:
        "touch {output}"

更新 将规则run_command的输入更改为lambda函数确实有效。

rule run_command:
    input:
        ORIGINAL_DATA_SOURCE+"{ds}/chr{chr}.gen.gz",
        lambda wildcards: PROCESS_DATA_SOURCE + f"{wildcards.ds}/{wildcards.ds.lower()}_filter.txt",
        lambda wildcards: PROCESS_DATA_SOURCE + f"{wildcards.ds}/{wildcards.ds.lower()}.sample"
    output:
        PROCESS_DATA_SOURCE+"{ds}/chr{chr}_filtered.gen.gz"
    run:
        # Run command that uses each of the previous files and runs per chromosome
        "mycommand.py {input.2} {input.1} {input.0} --out {output}"

1 个答案:

答案 0 :(得分:2)

input中使用的所有通配符都必须出现在output中。在规则run_command中,通配符{ds_lower}仅存在于input中,但不存在于output中。