我有多项研究,我必须为n项研究中的每项研究制作两个文件(.notsad和.txt文件)。创建这些之后,我必须运行一个命令,该命令在每个染色体上运行,并为给定研究中的每个染色体使用相同的两个输入文件(.notsad,.txt)。所以:
mycommand.py study1.notsad study1_filter.txt chr1.bad.gz --out chr1_filter.bad.gz
mycommand.py study1.notsad study1_filter.txt chr2.bad.gz --out chr2_filter.bad.gz
...
mycommand.py study2.notsad study2_filter.txt chr1.bad.gz --out chr1_filter.bad.gz
...
但是我无法让它运行。我得到一个错误:
WildcardError in line 33 of /scripts/Snakefile:
Wildcards in input files cannot be determined from output files:
'ds_lower'
到目前为止我的规则:
import os
import glob
ROOT = "/rootdir/"
ORIGINAL_DATA_FOLDER="original/"
PROCESS_DATA_FOLDER="process/"
ORIGINAL_DATA_SOURCE=ROOT+ORIGINAL_DATA_FOLDER
PROCESS_DATA_SOURCE=ROOT+PROCESS_DATA_FOLDER
DATASETS = [name for name in os.listdir(ORIGINAL_DATA_SOURCE) if os.path.isdir(os.path.join(ORIGINAL_DATA_SOURCE, name))]
LOWERCASE_DATASETS = [dataset.lower() for dataset in DATASETS]
CHROMOSOME = list(range(1,23))
rule all:
input:
expand(PROCESS_DATA_SOURCE+"{ds}/chr{chr}_filtered.gen.gz", ds=DATASETS, chr=CHROMOSOME)
rule run_command:
input:
ORIGINAL_DATA_SOURCE+"{ds}/chr{chr}.bad.gz", # Matches 22 chroms
PROCESS_DATA_SOURCE+"{ds}/{ds_lower}_filter.txt", # But this should be common to all chr runs for this study.
PROCESS_DATA_SOURCE+"{ds}/{ds_lower}.notsad" # This one as well.
output:
PROCESS_DATA_SOURCE+"{ds}/chr{chr}_filtered.gen.gz"
shell:
# Run command that uses each of the previous files and runs per chromosome
"mycommand.py {input.2} {input.1} {input.0} --out {output}"
rule write_txt_file:
input:
ORIGINAL_DATA_SOURCE+"{ds}/{ds_lower}_info.txt"
output:
PROCESS_DATA_SOURCE+"{ds}/{ds_lower}_filter.txt"
shell:
"touch {output}"
rule write_notsad_file:
input:
ORIGINAL_DATA_SOURCE+"{ds}/_{ds_lower}.sad"
output:
PROCESS_DATA_SOURCE+"{ds}/{ds_lower}.notsad"
shell:
"touch {output}"
更新
将规则run_command
的输入更改为lambda函数确实有效。
rule run_command:
input:
ORIGINAL_DATA_SOURCE+"{ds}/chr{chr}.gen.gz",
lambda wildcards: PROCESS_DATA_SOURCE + f"{wildcards.ds}/{wildcards.ds.lower()}_filter.txt",
lambda wildcards: PROCESS_DATA_SOURCE + f"{wildcards.ds}/{wildcards.ds.lower()}.sample"
output:
PROCESS_DATA_SOURCE+"{ds}/chr{chr}_filtered.gen.gz"
run:
# Run command that uses each of the previous files and runs per chromosome
"mycommand.py {input.2} {input.1} {input.0} --out {output}"
答案 0 :(得分:2)
input
中使用的所有通配符都必须出现在output
中。在规则run_command
中,通配符{ds_lower}
仅存在于input
中,但不存在于output
中。