Snakemake运行Subworkflow,但不运行其余工作流程(直接统治全部)

时间:2020-03-12 17:02:44

标签: snakemake

我是Snakemake和StackOverflow上的新手。如果有不清楚的地方或想要其他细节,请随时告诉我。 我编写了一个工作流,该工作流允许将.BCL Illumina基本调用文件转换为多路分解的.FASTQ文件,并生成QC报告(FastQC文件)。此工作流程由:

  • 子工作流程“ convert_bcl_to_fastq” 它从BCL文件在名为Fastq的目录中创建FASTQ文件。它必须在主工作流程之前执行,这就是为什么我选择使用子工作流程的原因,因为我的第二条规则取决于这些FASTQ文件的生成,而我事先不知道这些文件的名称。将创建一个伪文件“ convert_bcl_to_fastq.done”作为输出,以了解此子工作流何时按预期运行。
  • 规则“ generate_fastqc” :由于子工作流的缘故,它生成了FASTQ文件,并在名为FastQC的目录中创建了FASTQC文件。

问题

当我尝试运行我的工作流程时,没有任何错误,但是我的工作流程表现不正常。我只得到要运行的子工作流,然后是主要工作流,但是仅执行规则“全部” 。我的规则“ generate_fastqc”根本没有执行。我想知道我可能在哪里错了? 这就是我得到的:

Building DAG of jobs...
Executing subworkflow convert_bcl_to_fastq.
Building DAG of jobs...
Job counts:
        count   jobs
        1       convert_bcl_to_fastq
        1
[...]
Processing completed with 0 errors and 1 warnings.
Touching output file convert_bcl_to_fastq.done.
Finished job 0.
1 of 1 steps (100%) done
Complete log: /path/to/my/working/directory/conversion/.snakemake/log/2020-03-12T171952.799414.snakemake.log
Executing main workflow.
Using shell: /usr/bin/bash
Provided cores: 40
Rules claiming more threads will be scaled down.
Job counts:
        count   jobs
        1       all
        1

localrule all:
    input: /path/to/my/working/directory/conversion/convert_bcl_to_fastq.done
    jobid: 0

Finished job 0.
1 of 1 steps (100%) done

生成所有FASTQ文件后,如果我再次运行我的工作流程,这次它将执行规则“ generate_fastqc”

Building DAG of jobs...
Executing subworkflow convert_bcl_to_fastq.
Building DAG of jobs...
Nothing to be done.
Complete log: /path/to/my/working/directory/conversion/.snakemake/log/2020-03-12T174337.605716.snakemake.log
Executing main workflow.
Using shell: /usr/bin/bash
Provided cores: 40
Rules claiming more threads will be scaled down.
Job counts:
        count   jobs
        1       all
        95      generate_fastqc
        96

我希望工作流在子工作流执行完成后立即通过运行规则“ generate_fastqc”完全执行自身,但是实际上我被迫执行2次工作流。我认为该工作流程将正常工作,因为子工作流程将生成工作流程第二部分所需的所有文件... 您是否知道我可能在哪里错了? < / p>


我的代码

这是我主要工作流程的Snakefile:

subworkflow convert_bcl_to_fastq:
    workdir: WDIR + "conversion/"
    snakefile: WDIR + "conversion/Snakefile"

SAMPLES, = glob_wildcards(FASTQ_DIR + "{sample}_R1_001.fastq.gz")

rule all:
    input:
        convert_bcl_to_fastq("convert_bcl_to_fastq.done"),
        expand(FASTQC_DIR + "{sample}_R1_001_fastqc.html", sample=SAMPLES),
        expand(FASTQC_DIR + "{sample}_R2_001_fastqc.html", sample=SAMPLES)

rule generate_fastqc:
    output:
        FASTQC_DIR + "{sample}_R1_001_fastqc.html",
        FASTQC_DIR + "{sample}_R2_001_fastqc.html",
        temp(FASTQC_DIR + "{sample}_R1_001_fastqc.zip"),
        temp(FASTQC_DIR + "{sample}_R2_001_fastqc.zip")
    shell:
        "mkdir -p "+ FASTQC_DIR +" | " #Creates a FastQC directory if it is missing
        "fastqc --outdir "+ FASTQC_DIR +" "+ FASTQ_DIR +"{wildcards.sample}_R1_001.fastq.gz "+ FASTQ_DIR + " {wildcards.sample}_R2_001.fastq.gz &" #Generates FASTQC files for each sample at a time

这是我的子工作流程“ convert_bcl_to_fastq”的Snake文件:

rule all:
    input:
        "convert_bcl_to_fastq.done"

rule convert_bcl_to_fastq:
    output:
        touch("convert_bcl_to_fastq.done")
    shell:
        "mkdir -p "+ FASTQ_DIR +" | " #Creates a Fastq directory if it is missing
        "bcl2fastq --no-lane-splitting --runfolder-dir "+ INPUT_DIR +" --output-dir "+ FASTQ_DIR #Demultiplexes and Converts BCL files to FASTQ files

提前感谢您的帮助!

1 个答案:

答案 0 :(得分:0)

关于subworkflow的{​​{3}}当前状态:

When executing, snakemake first tries to create (or update, if necessary) 
"test.txt" (and all other possibly mentioned dependencies) by executing the subworkflow. 
Then the current workflow is executed.

在您的情况下,声明的唯一依赖项是“ convert_bcl_to_fastq.done”,Snakemake会在第一时间高兴地生成该依赖项。

Snakemake通常进行一次解析,并且未告知主工作流程从子工作流程中查找样本文件。由于样本文件在首次执行期间尚不存在,因此主工作流程在expand()语句中不匹配。没有匹配项,没有工作要做:-)

第二次运行主工作流程时,它将在expand()的{​​{1}}中找到样本匹配项并产生它们。

旁注1:很高兴注意到这一点。使用您的代码,如果您实际上进行了强制重新运行子工作流的更改,Snakemake将找到旧的“ convert_bcl_to_fastq.done”,而不重新执行子工作流。

旁注2:如果要使Snakemake不太“一过”,它就有一个规则关键字rule all:,可用于重新评估作为规则执行结果需要做的事情。在您的情况下,检查点应该是checkpoint。这将要求规则位于同一逻辑蛇文件中(尽管rule convert_bcl_to_fastq允许多个文件)