使用shell split命令在规则和错误中使用for循环

时间:2017-12-19 19:33:06

标签: python snakemake

我是Snakemake的第一次使用者,我的一些规则定义有问题。

这是我正在运行的蛇文件:

import os
import re

READS="r1 r2".split()
READ1=READS[0]
READ2=READS[1]
array=[]

#To get a list of filtered reads suffixes after running the rule split_reads
f_reads = os.listdir("./filtered_reads")
for i in range(len(f_reads)):
        mm = re.match(r'(.+-)([a-z][a-z])',f_reads[i])
        if mm:
                array.append(mm.group(2))

rule quality_filter:
        input:
                expand("data/{{sample}}_{read}.fq",read=READS)
        output:
               expand("filtered_reads/{{sample}}_{read}_q20p50.fq",read=READS)
        shell: """
                fastq_quality_filter -Q33 -q 20 -p 50 -i {input[0]} -o {output[0]}
                fastq_quality_filter -Q33 -q 20 -p 50 -i {input[1]} -o {output[1]}

        """



rule fastq_conversion:
        input:
                expand("filtered_reads/{{sample}}_{read}_q20p50.fq",read=READS)
        output:
                expand("filtered_reads/{{sample}}_{read}_q20p50.fasta",read=READS)
        shell: """
                fastq_to_fasta -Q33 -n -i {input[0]} -o {output[0]}
                fastq_to_fasta -Q33 -n -i {input[1]} -o {output[1]}
        """


rule fasta_temp:
        input:
                expand("filtered_reads/{{sample}}_{read}_q20p50.fasta",read=READS)
        output:
                expand("filtered_reads/{{sample}}_{read}_q20p50.fasta_temp",read=READS)

        shell: """
                awk '/>/{{temp=$0; getline; print temp "\\t" $0}}' {input[0]} | sed  's/\/1/ 1:N:0:8 /' | sort > {output[0]}
                awk '/>/{{temp=$0; getline; print temp "\\t" $0}}' {input[1]} | sed  's/\/2/ 2:N:0:8 /' | sort > {output[1]}
        """

rule filtered_join:
        input:
                expand("filtered_reads/{{sample}}_{read}_q20p50.fasta_temp",read=READS)
        output:
                expand("filtered_reads/{{sample}}_{read}_q20p50_filtered.fasta",read=READS)
        shell:"""
                join -j 1 -o 1.1 1.2 1.3 {input[0]} {input[1]} | awk '{{print $1 " " $2 "\\n" $3}}' > {output[0]}
                join -j 1 -o 2.1 2.2 2.3 {input[0]} {input[1]} | awk '{{print $1 " " $2 "\\n" $3}}' > {output[1]}

        """

rule split_reads:
        input:
                expand("filtered_reads/{{sample}}_{read}_q20p50_filtered.fasta",read=READS)
        output:
                expand("filtered_reads/{{sample}}_{read}_q20p50_filtered.fasta-",read=READS)

        shell:"""
                split -l 3000 {input[0]} {output[0]}
                split -l 3000 {input[1]} {output[1]}

        """


rule run_igblast:
        input:
                expand("filtered_reads/{{sample}}_{read}_q20p50_filtered.fasta-",read=READS)
        output:

        run: """
                for i in range(len(array)):
                        shell("perl vpairhumanalysis-igblast-isotype_BDmod.pl {input[0]+array[i]} {input[1]+array[i]} human_barcodes.txt")

        """

==========

错误:

创建输出文件时作业split_reads出错filter_reads / A_r1_q20p50_filtered.fasta-,filtered_reads / A_r2_q20p50_filtered.fasta-。 / home / vagrant / snakemake-2 / Snakefile-3的第67行中的MissingOutputException: 30秒后丢失文件: filtered_reads / A_r1_q20p50_filtered.fasta- filtered_reads / A_r2_q20p50_filtered.fasta- 这可能是由于文件系统延迟造成的。如果是这种情况,请考虑使用--latency-wait增加等待时间。 完成当前正在运行的工作后将退出。 因为作业执行失败而退出。请在上面查看错误消息

=============

根据split命令拆分文件,但程序在结尾处返回上述错误,并且不执行下一个规则。

此外,我还有另一个关于规则' run_igblast'的查询。因此,在分割文件后,我需要在split命令之后获得的文件上运行perl脚本。 perl脚本的输出实际上是一个不同的文件,其名称与输入文件类似。在输入文件与输出文件不匹配的情况下,我该怎么办?我不确定是否应将所有拆分文件的循环放在run_igblast规则的输入或运行部分中。我希望以下列方式在run_igblast中执行命令:

perl vpairhumanalysis-igblast-isotype_BDmod.pl A_r1_q20p50_filtered.fasta_aa A_r2_q20p50_filtered.fasta_aa human_barcodes.txt

0 个答案:

没有答案