通过qsub提交时如何确保snakemake规则依赖

时间:2019-06-03 17:38:15

标签: snakemake qsub

我正在使用Snakemake将作业提交到群集。我面临一种情况,我想强制某个规则仅在所有其他规则都运行后才能运行-这是因为该作业(R脚本)的输入文件尚未准备好。

我偶然在Snakemake文档页面上看到了这一点,其中指出可以强制执行规则-https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#flag-files

我有不同的规则,但是为了简单起见,我在下面显示我的Snakefile和最后两个规则(rsem_model和tximport_rsem)。在我的qsub集群工作流程中,我希望 tximport_rsem仅在rsem_model完成后执行,并且尝试了“ touchfile”方法,但无法使其成功运行。

# Snakefile
rule all:
   input:
       expand("results/fastqc/{sample}_fastqc.zip",sample=samples),
       expand("results/bbduk/{sample}_trimmed.fastq",sample=samples),
       expand("results/bbduk/{sample}_trimmed_fastqc.zip",sample=samples),
       expand("results/bam/{sample}_Aligned.toTranscriptome.out.bam",sample=samples),
       expand("results/bam/{sample}_ReadsPerGene.out.tab",sample=samples),
       expand("results/quant/{sample}.genes.results",sample=samples),
       expand("results/quant/{sample}_diagnostic.pdf",sample=samples),
       expand("results/multiqc/project_QS_STAR_RSEM_trial.html"),
       expand("results/rsem_tximport/RSEM_GeneLevel_Summarization.csv"),
       expand("mytask.done")

rule clean:
     shell: "rm -rf .snakemake/"

include: 'rules/fastqc.smk'
include: 'rules/bbduk.smk'
include: 'rules/fastqc_after.smk'
include: 'rules/star_align.smk'
include: 'rules/rsem_norm.smk'
include: 'rules/rsem_model.smk'
include: 'rules/tximport_rsem.smk'
include: 'rules/multiqc.smk'
rule rsem_model:
    input:
        'results/quant/{sample}.genes.results'
    output:
        'results/quant/{sample}_diagnostic.pdf'
    params:
        plotmodel = config['rsem_plot_model'],
        prefix = 'results/quant/{sample}',
        touchfile = 'mytask.done'
    threads: 16
    priority: 60
    shell:"""
          touch {params.touchfile}
          {params.plotmodel} {params.prefix} {output}
        """
rule tximport_rsem:
    input: 'mytask.done'
    output:
        'results/rsem_tximport/RSEM_GeneLevel_Summarization.csv'
    priority: 50
    shell: "Rscript scripts/RSEM_tximport.R"

这是我尝试空试时遇到的错误

snakemake -np
Building DAG of jobs...
MissingInputException in line 1 of /home/yh6314/rsem/tutorials/QS_Snakemake/rules/tximport_rsem.smk:
Missing input files for rule tximport_rsem:
mytask.done

要注意的一件事:如果我尝试在头节点上运行它,则不必执行“触摸文件”操作,一切正常。

我希望您能提出建议并能找到解决方法。

谢谢。

1 个答案:

答案 0 :(得分:1)

规则tximport_rsem仅在规则rsem_model中的所有作业完成后才执行(基于注释)。因此,在这种情况下,不需要中间文件mytask.done。将规则rsem_model的输出文件用于所有样本以规则tximport_rsem就足够了。

rule rsem_model:
    input:
        'results/quant/{sample}.genes.results'
    output:
        'results/quant/{sample}_diagnostic.pdf',
    shell:
        """
        {params.plotmodel} {params.prefix} {output.pdf}
        """

rule tximport_rsem:
    input: 
         expand('results/quant/{sample}_diagnostic.pdf', sample=sample_names)
    output:
        'results/rsem_tximport/RSEM_GeneLevel_Summarization.csv'
    shell: 
        "Rscript scripts/RSEM_tximport.R"