我是Snakemake的新手,并试图弄清楚嵌套配置值是如何工作的。我创建了以下配置文件...
# dummyconfig.json
{
"fam1": {
"numchr": 1,
"chrlen": 2500000,
"seeds": {
"genome": 8013785666,
"simtrio": 1776,
"simseq": {
"mother": 2053695854357871005,
"father": 4517457392071889495,
"proband": 2574020394472462046
}
},
"ninherited": 100,
"ndenovo": 5,
"numreads": 375000
}
}
...在我的Snakefile中遵循这条规则(以及其他规则)。
# Snakefile
rule simgenome:
input:
"human.order6.mm",
output:
"{family}-refr.fa.gz"
shell:
"nuclmm simulate --out - --order 6 --numseqs {config[wildcards.family][numchr]} --seqlen {config[wildcards.family][chrlen]} --seed {config[wildcards.family][seeds][genome]} {input} | gzip -c > {output}"
然后,我想通过调用fam1-refr.fa.gz
来创建snakemake --configfile dummyconfig.json fam1-refr.fa.gz
。当我这样做时,我收到以下错误消息。
Building DAG of jobs...
rule simgenome:
input: human.order6.mm
output: fam1-refr.fa.gz
jobid: 0
wildcards: family=fam1
RuleException in line 1 of /Users/standage/Projects/noble/Snakefile:
NameError: The name 'wildcards.family' is unknown in this context. Please make sure that you defined that variable. Also note that braces not used for variable access have to be escaped by repeating them, i.e. {{print $1}}
因此fam1
被正确识别为family
通配符的值,但它似乎不会显示{config[wildcards.family][numchr]}
之类的变量访问。
是否可以以这种方式遍历嵌套配置,或者Snakemake是否仅支持访问顶级变量?
答案 0 :(得分:1)
解决此问题的一种方法是使用params
并解析shell
块之外的变量。
rule simgenome:
input:
"human.order6.mm",
output:
"{family}-refr.fa.gz"
params:
seed=lambda w: config[w.family]['seeds']['genome'],
numseqs=lambda w: config[w.family]['numchr'],
seqlen=lambda w: config[w.family]['chrlen']
shell:
"nuclmm simulate --out - --order 6 --numseqs {params.numseqs} --seqlen {params.seqlen} --seed {params.seed} {input} | gzip -c > {output}"