Question

假设我有以下文件，我想使用snakemake自动应用一些处理：

test_input_C_1.txt
test_input_B_2.txt
test_input_A_2.txt
test_input_A_1.txt

以下snakefile使用expand来确定所有潜在的最终结果文件：

rule all:
    input: expand("test_output_{text}_{num}.txt", text=["A", "B", "C"], num=[1, 2])

rule make_output:
    input: "test_input_{text}_{num}.txt"
    output: "test_output_{text}_{num}.txt"
    shell:
        """
        md5sum {input} > {output}
        """

执行上述snakefile会导致以下错误：

MissingInputException in line 4 of /tmp/Snakefile:
Missing input files for rule make_output:
test_input_B_1.txt

该错误的原因是expand使用itertools.product来生成通配符组合，其中一些恰好与丢失的文件相对应。

如何过滤掉不需要的通配符组合？

Answer 1

expand函数接受第二个可选的非关键字参数，以使用与默认函数不同的函数来组合通配符值。

可以通过将其包装在更高阶的生成器中来创建itertools.product的过滤版本，该生成器检查所产生的通配符组合是否不在预先建立的黑名单中：

from itertools import product

def filter_combinator(combinator, blacklist):
    def filtered_combinator(*args, **kwargs):
        for wc_comb in combinator(*args, **kwargs):
            # Use frozenset instead of tuple
            # in order to accomodate
            # unpredictable wildcard order
            if frozenset(wc_comb) not in blacklist:
                yield wc_comb
    return filtered_combinator

# "B_1" and "C_2" are undesired
forbidden = {
    frozenset({("text", "B"), ("num", 1)}),
    frozenset({("text", "C"), ("num", 2)})}

filtered_product = filter_combinator(product, forbidden)

rule all:
    input:
        # Override default combination generator
        expand("test_output_{text}_{num}.txt", filtered_product, text=["A", "B", "C"], num=[1, 2])

rule make_output:
    input: "test_input_{text}_{num}.txt"
    output: "test_output_{text}_{num}.txt"
    shell:
        """
        md5sum {input} > {output}
        """

可以从配置文件中读取丢失的通配符组合。

以下是json格式的示例：

{
    "missing" :
    [
        {
            "text" : "B",
            "num" : 1
        },
        {
            "text" : "C",
            "num" : 2
        }
    ]
}

forbidden集将在snakefile中读取如下：

forbidden = {frozenset(wc_comb.items()) for wc_comb in config["missing"]}

当不需要某些特殊的通配符组合时，如何在snakemake中使用expand？

1 个答案: