Question

我的常规管道方式（部分基于this Biostars post）如下：

rule map:
    input: "{sample}.fq.gz",
    output: "sort/{sample}.bam"
    threads: 24
    shell:
        """
bwa mem reference.fa {input} \
-t {threads} | \
samtools sort - \
-@ {threads} \
-o {output}
        """

我热衷于尝试Snakemake的管道，因为我希望它们可以使具有多个管道的工作流更具可读性。

rule map:
    input: "{sample}.fq.gz",
    output: pipe("{sample}.bam")
    threads: 24
    shell:
        """
    bwa mem reference.fa {input} \
    -t {threads} \
    > {output}
        """

rule sort:
    input: "{sample}.bam"
    output: "sort/{sample}.bam"
    threads: 24
    shell:
        """
samtools sort {input} -@ {threads} -o {output}
        """

但是，这将导致以下WorkflowError: Job needs threads=48 but only threads=24 are available. This is likely because two jobs are connected via a pipe and have to run simultaneously. Consider providing more resources (e.g. via --cores).

因此，我必须在bwa和samtools之间划分线程，但是将线程分配给samtools意味着从bwa中删除线程，我不希望这样做。在具有多个管道步骤的工作流中，此问题将变得更加明显。

我还没有看到Snakemake管道使用过多，但是我想知道是否有人知道解决方法？我也在考虑在Snakemake的Github页面上提出这个问题。

还有一个关于管道的一般问题。 Snakemake是否有合理的理由将单独的线程分配给管道中的进程？我应该担心以常规方式使用24个线程同时使用bwa和samtools吗？

Snakemake中的管道：如何在规则之间共享资源？

0 个答案: