Question

是否可以在群集配置文件中定义内存和资源的默认设置，然后在需要时以规则特定方式覆盖？规则中的resources字段是否直接与群集配置文件相关联？或者，为了便于阅读，它只是params字段的一种奇特方式吗？

在下面的示例中，如何为rule a使用默认群集配置，但在memory=40000中使用自定义更改（rusage=15000和rule b）？

cluster.json：

{
    "__default__":
    {
        "memory": 20000,
        "resources": "\"rusage[mem=8000] span[hosts=1]\"",
        "output": "logs/cluster/{rule}.{wildcards}.out",
        "error": "logs/cluster/{rule}.{wildcards}.err"
    },
}

Snakefile：

rule all:
    'a_out.txt', 'b_out.txt'

rule a:
    input:
        'a.txt'
    output:
        'a_out.txt'
    shell:
        'touch {output}'

rule b:
    input:
        'b.txt'
    output:
        'b_out.txt'
    shell:
        'touch {output}'

执行命令：

 snakemake --cluster-config cluster.json 
           --cluster "bsub -M {cluster.memory} -R {cluster.resources} -o logs.txt" 
           -j 50

我知道可以在群集配置文件中定义特定于规则的资源要求，但如果可能的话，我更愿意直接在Snakefile中定义它们。

否则，如果有更好的方法来实现这一点，请告诉我。

Answer 1

您可以直接为每个规则添加resources：

rule all:
    'a_out.txt' , 'b_out.txt'

rule a:
    input:
        'a.txt'
    output:
        'a_out.txt'
    resources:
        mem_mb=40000
    shell:
        'touch {output}'
rule b:
    input:
        'b.txt'
    output:
        'b_out.txt'
    resources:
        mem_mb=20000
    shell:
        'touch {output}'

然后，您应该从resources中删除.json参数，以便命令行不会覆盖snakefile：

<强> new.cluster.json：

{
    "__default__":
    {
        "output": "logs/cluster/{rule}.{wildcards}.out",
        "error": "logs/cluster/{rule}.{wildcards}.err"
    },
}

Answer 2

实际上，您可以在new.cluster.json中定义特定规则的资源。因此，根据您的情况，您可以执行以下操作

{
    "__default__":
    {
        "memory": 20000,
        "resources": "\"rusage[mem=8000] span[hosts=1]\"",
        "output": "logs/cluster/{rule}.{wildcards}.out",
        "error": "logs/cluster/{rule}.{wildcards}.err"
    },
    "b":
    {
        "memory": 40000,
        "resources": "\"rusage[mem=15000] span[hosts=1]\"",
        "output": "logs/cluster/{rule}.{wildcards}.out",
        "error": "logs/cluster/{rule}.{wildcards}.err"
    },
}

然后在Snakefile中，您可以通过导入new.cluster.json并在规则中对其进行引用来引用这些资源

import json

with open('new.cluster.json') as fh:
    cluster_config = json.load(fh)

rule all:
    'a_out.txt' , 'b_out.txt'

rule a:
    input:
        'a.txt'
    output:
        'a_out.txt'
    shell:
        'touch {output}'
rule b:
    input:
        'b.txt'
    output:
        'b_out.txt'
    resources:
        mem_mb=cluster_config["b"]["memory"]
    shell:
        'touch {output}'

如果您浏览this repository，您会发现我是如何在野外使用这些群集配置的。

Snakemake - 以特定于规则的方式覆盖LSF（bsub）群集配置

2 个答案: