snakemake配置参数值是否可以插入/扩展{<name>}值的字符串?

时间:2017-08-08 20:09:44

标签: snakemake parameter-expansion wildcard-expansion

有没有办法在.yaml文件中定义snakemake配置字符串,以便它可以包含{wildcard}和{param}值,并且当在shell命令中使用该字符串时,{&lt; name&gt;}值被替换为“&lt; name&gt;”的实际值?

例如,假设您希望配置字符串定义要作为参数传递给程序的字符串格式:

RG:“ID:{ID} REP:{REP}”

其中上面是.yaml文件,ID和REP是通配符,shell命令会将展开的字符串作为参数传递给程序。

4 个答案:

答案 0 :(得分:7)

让我试着简单回答这个问题:

在Snakemake中,你可以为params提供函数,它们将通配符作为参数。在这些函数中,您可以执行任何python代码,包括格式化语句来格式化配置值,例如

SELECT fr.DateRescue,
   SUM( CASE
            WHEN fi.Gender = 'Male' THEN 1
            ELSE 0
        END) AS Male,
   SUM( CASE
            WHEN fi.Gender = 'Female' THEN 1
            ELSE 0
        END) AS Female,
   SUM( CASE
            WHEN (2017 - YEAR(fi.Bday)) < 18 THEN 1
            ELSE 0
        END) AS Minor,
   SUM( CASE
            WHEN (2017 - YEAR(fi.Bday)) >= 18 THEN 1
            ELSE 0
        END) AS Adult,
   SUM( CASE
            WHEN fi.Gender = 'Male'
                 AND (2017 - YEAR(fi.Bday)) >= 18 THEN 1
            ELSE 0
        END) AS AMale,
   SUM( CASE
            WHEN fi.Gender = 'Female'
                 AND (2017 - YEAR(fi.Bday)) >= 18 THEN 1
            ELSE 0
        END) AS AFemale,
   SUM( CASE
            WHEN fi.Gender = 'Male'
                 AND (2017 - YEAR(fi.Bday)) < 18 THEN 1
            ELSE 0
        END) AS MMale,
   SUM( CASE
            WHEN fi.Gender = 'Female'
                 AND (2017 - YEAR(fi.Bday)) < 18 THEN 1
            ELSE 0
        END) AS MFemale
FROM flexcode_Info AS fi
INNER JOIN flexcode_rescued AS fr ON fr.GROUPID = fi.GROUPID
GROUP BY fr.DateRescue

如您所见,您可以使用python unpacking operatorconfigfile: "config.yaml" rule: output: "plots/myplot.{mywildcard}.pdf" params: myparam=lambda wildcards: config["mykey"].format(**wildcards) shell: ... 方法替换配置文件中的值。这假定str.format产生一个包含与上面相同的通配符的字符串,例如config["mykey"]

答案 1 :(得分:1)

是的,使用params lambda函数:

MACBOOK> cat paramsArgs.yaml
A: "Hello world"
B: "Message: {config[A]}  ID: {wildcards.ID}   REP: {wildcards.REP}"

MACBOOK> cat paramsArgs
configfile: "paramsArgs.yaml"

rule all:
    input: "ID2307_REP12.txt"

def paramFunc(key, wildcards, config):
    return config[key].format(wildcards=wildcards, config=config)

rule:
    output: "ID{ID}_REP{REP}.txt"
    params: A=config["A"], B=lambda wildcards: paramFunc("B", wildcards, config)
    shell:
        """
        echo 'A is {params.A}' > {output}
        echo 'B is {params.B}' >> {output}
        """

MACBOOK> snakemake -s paramsArgs
Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
    count   jobs
    1   2
    1   all
    2

rule 2:
    output: ID2307_REP12.txt
    jobid: 1
    wildcards: REP=12, ID=2307

Finished job 1.
1 of 2 steps (50%) done

localrule all:
    input: ID2307_REP12.txt
    jobid: 0

Finished job 0.
2 of 2 steps (100%) done

MACBOOK> cat ID2307_REP12.txt 
A is Hello world
B is Message: Hello world  ID: 2307   REP: 12

答案 2 :(得分:0)

这是一个param函数,让你可以在配置字符串中扩展来自几个不同的snakemake源的值:

def paramFunc(wildcards, input, output, threads, resources, config,
  global_cfg, this_cfg, S):

    return S.format(wildcards=wildcards, input=input, output=output,
        threads=threads, resources=resources, config=config,
        global_cfg=global_cfg, this_cfg=this_cfg)

这是一个如何从Snakemake params:部分调用paramFunc()的示例,扩展config参数config [“XYZ”]的值并将其分配给名为“text”的参数,然后展开“文本“shell命令中的参数:

   params:
       text=lambda wildcards, input, output, threads, resources:
           paramFunc(wildcards, input, output, threads, resources, config,
                global_cfg, my_local_cfg, config["XYZ"])
   shell: "echo 'text is {params.text}'"

请注意,paramFunc()的最后一个参数是您想要的参数值 在这种情况下,展开,配置[“XYZ”]。其他参数都是包含可能由该参数值引用的值的字典。

您可能已经定义了这样的config [“XYZ”],例如,在.yaml文件中:

ABC: "Hello world"
XYZ: "ABC is {config[ABC]}"

但是,字符串XYZ不限于扩展在同一文件中定义的值(此处展开ABC),但您可以使用其他“{}”构造来访问其他位置定义的其他值:

Defined in                               Use this construct in param
----------                               ---------------------------
"config" dictionary                      "{config[<name>]}"
wildcards used in the output filename    "{wildcards[<name>]}"
input filename(s)                        "{input}" or "{input[NAME]}" or "{input[#]}"
output filename(s)                       "{output}" or "{output[NAME]}" or "{output[#]}"
threads                                  "{threads}"
resources                                "{resources[<name>]}"
"global_cfg" global config dictionary    "{global_cfg[<name>]}"
"my_local_cfg" module config dictionary  "{this_cfg[<name>]}"

值“global_cfg”和“my_local_cfg”是两个可以添加的特殊字典,用于协助模块化snakefile。

对于“global_cfg”,我们的想法是你可能想要一个snakefile-global定义的字典。在你的主要snakefile中,执行以下操作:

include: "global_cfg.py"

在文件global_cfg.py中,放置全局定义:

global_cfg = {
    "DATA_DIR" : "ProjData",
    "PROJ_DESC" : "Mint Sequencing"
}

然后您可以在参数字符串中引用这些值,例如:

"{global_cfg[DATADIR]}"

(必须通过调用paramFunc()来在字符串:section中扩展字符串)

对于“my_local_cfg”,我们的想法是您可能希望将每个snakefile规则放在一个单独的文件中,并且该规则的参数也在单独的文件中定义,因此每个规则都有一个规则文件和一个参数文件。在主要的snakefile中:

(include paramFunc() definition above)
include: "myrule.snake"
rule all:
    input: "myrule.txt"

在myrule.snake中:

include: "myrule.py"

在myrule.py中放置myrule模块的配置设置:

myrule_cfg = {
    "SPD" : 125,
    "DIST" : 98,
    "MSG" : "Param settings: Speed={this_cfg[SPD]}  Dist={this_cfg[DIST]}"
}

然后回到myrule.snake:

include: "myrule.py"
rule myrule:
    params:
        SPD=myrule_cfg["SPD"],
        DIST=myrule_cfg["DIST"],
        # For MSG call paramFunc() to expand {name} constructs.
        MSG=lambda wildcards, input, output, threads, resources:
           paramFunc(wildcards, input, output, threads, resources, config,
               global_cfg, myrule_cfg, myrule_cfg["MSG"])
    message: "{params.MSG}"
    output: "myrule.txt"
    shell: "echo '-speed {params.SPD} -dist {params.DIST}' >{output}"

请注意,paramFunc()函数将名称“myrule_cfg”(从一个规则到下一个规则)映射到固定名称“this_cfg”(不管规则如何)。

请注意,我包含了定义global_cfg和this_cfg词典的.py文件。这些可以在.yaml文件中定义,但问题是它们都会在一个字典“config”中结束。如果configfile命令允许指定字典,那将是很好的,例如:

configfile: global_cfg="global_cfg.yaml"

也许这个功能有一天会被添加到snakemake。

答案 3 :(得分:0)

我意识到在JohannesKöster的答案中,可以使用** config和** globals()格式()的附加参数来扩展snakefile的python代码中定义的变量,例如变量“ABC”in以下示例,并允许扩展配置参数而不在扩展中使用“config”。假设config.yaml包含:

X: "Hello"
MSG: "config X: {X}   variable ABC: {ABC}   wildcard WW: {WW}"

你有这个蛇文件:

configfile: "config.yaml"

rule all:
    input: "test.Goodbye.txt"

rule A:
    output: "test.{WW}.txt"
    params: MSG=lambda wildcards: config["MSG"].format(wildcards=wildcards, **config, **globals())
    message: "{params.MSG}"
    shell: "echo '{params.MSG}' >{output}"


ABC = "This is the ABC variable"

消息和文件输出将是以下行:

config X: Hello   variable ABC: This is the ABC variable   wildcard WW: Goodbye