有没有办法在.yaml文件中定义snakemake配置字符串,以便它可以包含{wildcard}和{param}值,并且当在shell命令中使用该字符串时,{< name>}值被替换为“< name>”的实际值?
例如,假设您希望配置字符串定义要作为参数传递给程序的字符串格式:
RG:“ID:{ID} REP:{REP}”
其中上面是.yaml文件,ID和REP是通配符,shell命令会将展开的字符串作为参数传递给程序。
答案 0 :(得分:7)
让我试着简单回答这个问题:
在Snakemake中,你可以为params提供函数,它们将通配符作为参数。在这些函数中,您可以执行任何python代码,包括格式化语句来格式化配置值,例如
SELECT fr.DateRescue,
SUM( CASE
WHEN fi.Gender = 'Male' THEN 1
ELSE 0
END) AS Male,
SUM( CASE
WHEN fi.Gender = 'Female' THEN 1
ELSE 0
END) AS Female,
SUM( CASE
WHEN (2017 - YEAR(fi.Bday)) < 18 THEN 1
ELSE 0
END) AS Minor,
SUM( CASE
WHEN (2017 - YEAR(fi.Bday)) >= 18 THEN 1
ELSE 0
END) AS Adult,
SUM( CASE
WHEN fi.Gender = 'Male'
AND (2017 - YEAR(fi.Bday)) >= 18 THEN 1
ELSE 0
END) AS AMale,
SUM( CASE
WHEN fi.Gender = 'Female'
AND (2017 - YEAR(fi.Bday)) >= 18 THEN 1
ELSE 0
END) AS AFemale,
SUM( CASE
WHEN fi.Gender = 'Male'
AND (2017 - YEAR(fi.Bday)) < 18 THEN 1
ELSE 0
END) AS MMale,
SUM( CASE
WHEN fi.Gender = 'Female'
AND (2017 - YEAR(fi.Bday)) < 18 THEN 1
ELSE 0
END) AS MFemale
FROM flexcode_Info AS fi
INNER JOIN flexcode_rescued AS fr ON fr.GROUPID = fi.GROUPID
GROUP BY fr.DateRescue
如您所见,您可以使用python unpacking operator和configfile: "config.yaml"
rule:
output:
"plots/myplot.{mywildcard}.pdf"
params:
myparam=lambda wildcards: config["mykey"].format(**wildcards)
shell:
...
方法替换配置文件中的值。这假定str.format
产生一个包含与上面相同的通配符的字符串,例如config["mykey"]
。
答案 1 :(得分:1)
是的,使用params lambda函数:
MACBOOK> cat paramsArgs.yaml
A: "Hello world"
B: "Message: {config[A]} ID: {wildcards.ID} REP: {wildcards.REP}"
MACBOOK> cat paramsArgs
configfile: "paramsArgs.yaml"
rule all:
input: "ID2307_REP12.txt"
def paramFunc(key, wildcards, config):
return config[key].format(wildcards=wildcards, config=config)
rule:
output: "ID{ID}_REP{REP}.txt"
params: A=config["A"], B=lambda wildcards: paramFunc("B", wildcards, config)
shell:
"""
echo 'A is {params.A}' > {output}
echo 'B is {params.B}' >> {output}
"""
MACBOOK> snakemake -s paramsArgs
Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 2
1 all
2
rule 2:
output: ID2307_REP12.txt
jobid: 1
wildcards: REP=12, ID=2307
Finished job 1.
1 of 2 steps (50%) done
localrule all:
input: ID2307_REP12.txt
jobid: 0
Finished job 0.
2 of 2 steps (100%) done
MACBOOK> cat ID2307_REP12.txt
A is Hello world
B is Message: Hello world ID: 2307 REP: 12
答案 2 :(得分:0)
这是一个param函数,让你可以在配置字符串中扩展来自几个不同的snakemake源的值:
def paramFunc(wildcards, input, output, threads, resources, config,
global_cfg, this_cfg, S):
return S.format(wildcards=wildcards, input=input, output=output,
threads=threads, resources=resources, config=config,
global_cfg=global_cfg, this_cfg=this_cfg)
这是一个如何从Snakemake params:部分调用paramFunc()的示例,扩展config参数config [“XYZ”]的值并将其分配给名为“text”的参数,然后展开“文本“shell命令中的参数:
params:
text=lambda wildcards, input, output, threads, resources:
paramFunc(wildcards, input, output, threads, resources, config,
global_cfg, my_local_cfg, config["XYZ"])
shell: "echo 'text is {params.text}'"
请注意,paramFunc()的最后一个参数是您想要的参数值 在这种情况下,展开,配置[“XYZ”]。其他参数都是包含可能由该参数值引用的值的字典。
您可能已经定义了这样的config [“XYZ”],例如,在.yaml文件中:
ABC: "Hello world"
XYZ: "ABC is {config[ABC]}"
但是,字符串XYZ不限于扩展在同一文件中定义的值(此处展开ABC),但您可以使用其他“{}”构造来访问其他位置定义的其他值:
Defined in Use this construct in param
---------- ---------------------------
"config" dictionary "{config[<name>]}"
wildcards used in the output filename "{wildcards[<name>]}"
input filename(s) "{input}" or "{input[NAME]}" or "{input[#]}"
output filename(s) "{output}" or "{output[NAME]}" or "{output[#]}"
threads "{threads}"
resources "{resources[<name>]}"
"global_cfg" global config dictionary "{global_cfg[<name>]}"
"my_local_cfg" module config dictionary "{this_cfg[<name>]}"
值“global_cfg”和“my_local_cfg”是两个可以添加的特殊字典,用于协助模块化snakefile。
对于“global_cfg”,我们的想法是你可能想要一个snakefile-global定义的字典。在你的主要snakefile中,执行以下操作:
include: "global_cfg.py"
在文件global_cfg.py中,放置全局定义:
global_cfg = {
"DATA_DIR" : "ProjData",
"PROJ_DESC" : "Mint Sequencing"
}
然后您可以在参数字符串中引用这些值,例如:
"{global_cfg[DATADIR]}"
(必须通过调用paramFunc()来在字符串:section中扩展字符串)
对于“my_local_cfg”,我们的想法是您可能希望将每个snakefile规则放在一个单独的文件中,并且该规则的参数也在单独的文件中定义,因此每个规则都有一个规则文件和一个参数文件。在主要的snakefile中:
(include paramFunc() definition above)
include: "myrule.snake"
rule all:
input: "myrule.txt"
在myrule.snake中:
include: "myrule.py"
在myrule.py中放置myrule模块的配置设置:
myrule_cfg = {
"SPD" : 125,
"DIST" : 98,
"MSG" : "Param settings: Speed={this_cfg[SPD]} Dist={this_cfg[DIST]}"
}
然后回到myrule.snake:
include: "myrule.py"
rule myrule:
params:
SPD=myrule_cfg["SPD"],
DIST=myrule_cfg["DIST"],
# For MSG call paramFunc() to expand {name} constructs.
MSG=lambda wildcards, input, output, threads, resources:
paramFunc(wildcards, input, output, threads, resources, config,
global_cfg, myrule_cfg, myrule_cfg["MSG"])
message: "{params.MSG}"
output: "myrule.txt"
shell: "echo '-speed {params.SPD} -dist {params.DIST}' >{output}"
请注意,paramFunc()函数将名称“myrule_cfg”(从一个规则到下一个规则)映射到固定名称“this_cfg”(不管规则如何)。
请注意,我包含了定义global_cfg和this_cfg词典的.py文件。这些可以在.yaml文件中定义,但问题是它们都会在一个字典“config”中结束。如果configfile命令允许指定字典,那将是很好的,例如:
configfile: global_cfg="global_cfg.yaml"
也许这个功能有一天会被添加到snakemake。
答案 3 :(得分:0)
我意识到在JohannesKöster的答案中,可以使用** config和** globals()格式()的附加参数来扩展snakefile的python代码中定义的变量,例如变量“ABC”in以下示例,并允许扩展配置参数而不在扩展中使用“config”。假设config.yaml包含:
X: "Hello"
MSG: "config X: {X} variable ABC: {ABC} wildcard WW: {WW}"
你有这个蛇文件:
configfile: "config.yaml"
rule all:
input: "test.Goodbye.txt"
rule A:
output: "test.{WW}.txt"
params: MSG=lambda wildcards: config["MSG"].format(wildcards=wildcards, **config, **globals())
message: "{params.MSG}"
shell: "echo '{params.MSG}' >{output}"
ABC = "This is the ABC variable"
消息和文件输出将是以下行:
config X: Hello variable ABC: This is the ABC variable wildcard WW: Goodbye