执行蛇形规则作为最后一条规则

时间:2020-01-16 08:59:44

标签: python-3.x snakemake

我试图创建一个snakemake文件来运行sortmeRNA管道:

SAMPLES = ['test']
READS=["R1", "R2"]

rule all:
    input: expand("Clean/4.Unmerge/{exp}.non_rRNA_{read}.fastq", exp = SAMPLES, read = READS)

rule unzip:
    input: 
        fq = "trimmed/{exp}.{read}.trimd.fastq.gz"
    output: 
        ofq = "Clean/1.Unzipped/{exp}.{read}.trimd.fastq"
    shell: "gzip -dkc < {input.fq} > {output.ofq}"

rule merge_paired:
    input: 
        read1 = "Clean/1.Unzipped/{exp}.R1.trimd.fastq",
        read2 = "Clean/1.Unzipped/{exp}.R2.trimd.fastq"
    output: 
        il = "Clean/2.interleaved/{exp}.il.trimd.fastq"
    shell: "merge-paired-reads.sh {input.read1} {input.read2} {output.il}"

rule sortmeRNA:
    input: 
        ilfq = "Clean/2.interleaved/{exp}.il.trimd.fastq"
    output:
        reads_rRNA = "Clean/3.sorted/{exp}_reads_rRNA",
        non_rRNA = "Clean/3.sorted/{exp}_reads_nonRNA"
    params:
        silvabac = "rRNA_databases/silva-bac-16s-id90.fasta,index/silva-bac-16s-db:rRNA_databases/silva-bac-23s-id98.fasta,index/silva-bac-23s-db",
        silvaarc = "rRNA_databases/silva-arc-16s-id95.fasta,index/silva-arc-16s-db:rRNA_databases/silva-arc-23s-id98.fasta,index/silva-arc-23s-db",
        silvaeuk = "rRNA_databases/silva-euk-18s-id95.fasta,index/silva-euk-18s-db:rRNA_databases/silva-euk-28s-id98.fasta,index/silva-euk-28s-db",
        rfam = "rRNA_databases/rfam-5s-database-id98.fasta,index/rfam-5s-db:rRNA_databases/rfam-5.8s-database-id98.fasta,index/rfam-5.8s-db",
        acc = "--num_alignments 1 --fastx --log -a 20 -m 64000 --paired_in -v"
    log:
        "Clean/sortmeRNAlogs/{exp}_sortmeRNA.log"
        shell:'''
        sortmerna --ref {params.silvabac}:{params.silvaarc}:{params.silvaeuk}:{params.rfam} --reads {input.ilfq} --aligned {output.reads_rRNA} --other {output.non_rRNA} {params.acc}
        '''
rule unmerge_paired:
    input:
        inun = "Clean/3.sorted/{exp}_reads_nonRNA.fastq"
    output:
        R1 = "Clean/4.Unmerge/{exp}.non_rRNA_R1.fastq",
        R2 = "Clean/4.Unmerge/{exp}.non_rRNA_R2.fastq"
    shell:"unmerge-paired-reads.sh {input.inun} {output.R1} {output.R2}"

这很好!但是对于1个样本,它产生的输出大小约为53 GB。我有90个示例要运行,无法承受巨大的磁盘空间。我试图使规则输出解压缩,merge_paired,sortmeRNA为temp(),但是在执行unmerge_paired时会引发“缺少输入文件异常”错误。 我还尝试添加rule_remove来删除所有这些中间目录。但这不是最后一条规则,而是再次出现在中间引发错误的地方!有什么有效的方法吗?

发生的错误是:

MissingInputException in line 45 of sortmeRNA_pipeline_memv2.0.snakefile:
Missing input files for rule unmerge_paired:
Clean/3.sorted/test_reads_nonRNA.fastq

还请注意,规则sortmeRNA需要输出字符串并生成string.fastq文件,然后将其输入到规则unmerge_paired中! 谢谢。

1 个答案:

答案 0 :(得分:2)

要使Snakemake将一个规则的输入连接到另一个规则的输出,它们将需要相同。无论您是否将sortmeRNA放在unmerge_paired周围,描述temp()的输出和输入rule sortmeRNA: input: ilfq = "Clean/2.interleaved/{exp}.il.trimd.fastq" output: reads_rRNA = temp("Clean/3.sorted/{exp}_reads_rRNA.fastq"), non_rRNA = temp("Clean/3.sorted/{exp}_reads_nonRNA.fastq") params: reads_rRNA = "Clean/3.sorted/{exp}_reads_rRNA", non_rRNA = "Clean/3.sorted/{exp}_reads_nonRNA" shell: ''' sortmerna --aligned {params.reads_rRNA} --other {params.non_rRNA} ... ''' rule unmerge_paired: input: inun = "Clean/3.sorted/{exp}_reads_nonRNA.fastq" # or rules.sortmeRNA.output.non_rRNA output: R1 = "Clean/4.Unmerge/{exp}.non_rRNA_R1.fastq", R2 = "Clean/4.Unmerge/{exp}.non_rRNA_R2.fastq" shell: "unmerge-paired-reads.sh {input.inun} {output.R1} {output.R2}" 的方式都是不起作用的。

sortmeRNA

我删除了所有不必要的东西,以了解发生了什么,您将不得不将它们放回原处。我将use testdb go alter database testdb set single_user with rollback immediate go use master BACKUP LOG [TESTDB] TO DISK = N'c:\temp\TESTDB.bak' WITH NO_TRUNCATE , NOFORMAT, NOINIT, NAME = N'TESTDB-Full Database Backup', SKIP, NOREWIND, NOUNLOAD, NORECOVERY , STATS = 10, CHECKSUM GO 的输出更改为规则的实际输出(并使其变为临时的)。我还添加了两个参数,它们与输出相同,但是没有fastq扩展。

相关问题