Question

我尝试在能够运行自定义perl脚本的Snakefile中编写规则。有两个输入文件和一个输出文件。 inputfile和outputfile中包含通配符，因为我想为各种文件运行脚本。但是当我为了生成不同的输入和输出文件而扩展时，perl脚本将所有可能的输入文件作为输入，而我希望它们逐个进行。我该怎么做才能让perl'逐个'吃掉'输入文件？这是我的代码：

DOMAINS= ["Metallophos", "PP2C", "Y_phosphatase"]
SUPERGROUPS=["2supergroups","5supergroups"]

rule add_supergroups:
    input:
        newick=expand("data/{domain}/{supergroup}/RAxML_bipartitionsBranchLabels.bbhlist.txt.{domain}.fa.aligned.rp.me-25.id.phylip",domain=DOMAINS, supergroup=SUPERGROUPS),
        sup="data/species.v3.1.1.supergroups.txt"
    output:
        expand("results/{domain}/{supergroup}/RAxML_bipartitionsBranchLabels.bbhlist.txt.{domain}.fa.aligned.rp.me-25.id.phylip.supergroups", domain=DOMAINS, supergroup=SUPERGROUPS)
    shell:
        "perl scripts/change_newick.pl {input.sup} {input.newick} {output}"

Answer 1

您可以删除expand（）函数并使用规则＆＃34; all＆＃34;定义你的目标。规则add_supergroups中的通配符值将自动从此目标文件中推断出来。

您甚至可以在规则＆＃34; add_supergroups＆＃34;中使用不同的通配符名称。因为Snakemake会认出并匹配这些模式。

DOMAINS= ["Metallophos", "PP2C", "Y_phosphatase"]
SUPERGROUPS=["2supergroups","5supergroups"]

rule all: 
    input: expand("results/{domain}/{supergroup}/RAxML_bipartitionsBranchLabels.bbhlist.txt.{domain}.fa.aligned.rp.me-25.id.phylip.supergroups"

rule add_supergroups:
    input:
        newick="data/{domain}/{supergroup}/RAxML_bipartitionsBranchLabels.bbhlist.txt.{domain}.fa.aligned.rp.me-25.id.phylip",
        sup="data/species.v3.1.1.supergroups.txt"
    output:
        "results/{domain}/{supergroup}/RAxML_bipartitionsBranchLabels.bbhlist.txt.{domain}.fa.aligned.rp.me-25.id.phylip.supergroups"
    shell:
        "perl scripts/change_newick.pl {input.sup} {input.newick} {output}"

理论上，它应该像这样工作：

DOMAINS= ["Metallophos", "PP2C", "Y_phosphatase"]
SUPERGROUPS=["2supergroups","5supergroups"]

rule all: 
    input: expand("results/{domain}/{supergroup}/RAxML_bipartitionsBranchLabels.bbhlist.txt.{domain}.fa.aligned.rp.me-25.id.phylip.supergroups"

rule add_supergroups:
    input:
        newick="data/{foo}",
        sup="data/species.v3.1.1.supergroups.txt"
    output:
        "results/{foo}.supergroups"
    shell:
        "perl scripts/change_newick.pl {input.sup} {input.newick} {output}"

Answer 2

您的规则要运行所有文件的原因很简单：功能 expand（）。

就像你似乎知道的那样，expand会创建一个python字符串列表，对于管理Snakemake中的文件非常有用。

但是在您的示例中，规则要在{input.newick}中使用文件列表和{input.sup}中的一个文件运行perl脚本以生成列表文件作为输出。

您可以通过不使用expand function on the input and output轻松解决问题。

但Snakemake将如何认识到他必须制作所有文件？通过在rule add_supergroups 之前创建规则目标，将rule add_supergroups扩展为<{1}}。

让我们做一些代码：

DOMAINS= ["Metallophos", "PP2C", "Y_phosphatase"] SUPERGROUPS=["2supergroups","5supergroups"] rule target : input : expand("results/{domain}/{supergroup}/RAxML_bipartitionsBranchLabels.bbhlist.txt.{domain}.fa.aligned.rp.me-25.id.phylip.supergroups", domain=DOMAINS, supergroup=SUPERGROUPS) rule add_supergroups: input: newick="data/{domain}/{supergroup}/RAxML_bipartitionsBranchLabels.bbhlist.txt.{domain}.fa.aligned.rp.me-25.id.phylip", sup="data/species.v3.1.1.supergroups.txt" output: "results/{domain}/{supergroup}/RAxML_bipartitionsBranchLabels.bbhlist.txt.{domain}.fa.aligned.rp.me-25.id.phylip.supergroups" shell: "perl scripts/change_newick.pl {input.sup} {input.newick} {output}"

现在它应该工作了。 Snakemake需要target rule的文件列表。他搜索所有规则以查找是否可以生成这些文件。

在这种情况下，他会识别pattern filename output add_supergroups。所以他会自动完成DOMAINS和SUPERGROUPS的 wilcards 。规则add_supergroups将逐个运行。

如何将通配符参数传递给snakefile中的perl脚本？

2 个答案: