我是Snakemake的第一次使用者,我的一些规则定义有问题。
这是我正在运行的蛇文件:
import os
import re
READS="r1 r2".split()
READ1=READS[0]
READ2=READS[1]
array=[]
#To get a list of filtered reads suffixes after running the rule split_reads
f_reads = os.listdir("./filtered_reads")
for i in range(len(f_reads)):
mm = re.match(r'(.+-)([a-z][a-z])',f_reads[i])
if mm:
array.append(mm.group(2))
rule quality_filter:
input:
expand("data/{{sample}}_{read}.fq",read=READS)
output:
expand("filtered_reads/{{sample}}_{read}_q20p50.fq",read=READS)
shell: """
fastq_quality_filter -Q33 -q 20 -p 50 -i {input[0]} -o {output[0]}
fastq_quality_filter -Q33 -q 20 -p 50 -i {input[1]} -o {output[1]}
"""
rule fastq_conversion:
input:
expand("filtered_reads/{{sample}}_{read}_q20p50.fq",read=READS)
output:
expand("filtered_reads/{{sample}}_{read}_q20p50.fasta",read=READS)
shell: """
fastq_to_fasta -Q33 -n -i {input[0]} -o {output[0]}
fastq_to_fasta -Q33 -n -i {input[1]} -o {output[1]}
"""
rule fasta_temp:
input:
expand("filtered_reads/{{sample}}_{read}_q20p50.fasta",read=READS)
output:
expand("filtered_reads/{{sample}}_{read}_q20p50.fasta_temp",read=READS)
shell: """
awk '/>/{{temp=$0; getline; print temp "\\t" $0}}' {input[0]} | sed 's/\/1/ 1:N:0:8 /' | sort > {output[0]}
awk '/>/{{temp=$0; getline; print temp "\\t" $0}}' {input[1]} | sed 's/\/2/ 2:N:0:8 /' | sort > {output[1]}
"""
rule filtered_join:
input:
expand("filtered_reads/{{sample}}_{read}_q20p50.fasta_temp",read=READS)
output:
expand("filtered_reads/{{sample}}_{read}_q20p50_filtered.fasta",read=READS)
shell:"""
join -j 1 -o 1.1 1.2 1.3 {input[0]} {input[1]} | awk '{{print $1 " " $2 "\\n" $3}}' > {output[0]}
join -j 1 -o 2.1 2.2 2.3 {input[0]} {input[1]} | awk '{{print $1 " " $2 "\\n" $3}}' > {output[1]}
"""
rule split_reads:
input:
expand("filtered_reads/{{sample}}_{read}_q20p50_filtered.fasta",read=READS)
output:
expand("filtered_reads/{{sample}}_{read}_q20p50_filtered.fasta-",read=READS)
shell:"""
split -l 3000 {input[0]} {output[0]}
split -l 3000 {input[1]} {output[1]}
"""
rule run_igblast:
input:
expand("filtered_reads/{{sample}}_{read}_q20p50_filtered.fasta-",read=READS)
output:
run: """
for i in range(len(array)):
shell("perl vpairhumanalysis-igblast-isotype_BDmod.pl {input[0]+array[i]} {input[1]+array[i]} human_barcodes.txt")
"""
==========
错误:
创建输出文件时作业split_reads出错filter_reads / A_r1_q20p50_filtered.fasta-,filtered_reads / A_r2_q20p50_filtered.fasta-。 / home / vagrant / snakemake-2 / Snakefile-3的第67行中的MissingOutputException: 30秒后丢失文件: filtered_reads / A_r1_q20p50_filtered.fasta- filtered_reads / A_r2_q20p50_filtered.fasta- 这可能是由于文件系统延迟造成的。如果是这种情况,请考虑使用--latency-wait增加等待时间。 完成当前正在运行的工作后将退出。 因为作业执行失败而退出。请在上面查看错误消息
=============
根据split命令拆分文件,但程序在结尾处返回上述错误,并且不执行下一个规则。
此外,我还有另一个关于规则' run_igblast'的查询。因此,在分割文件后,我需要在split命令之后获得的文件上运行perl脚本。 perl脚本的输出实际上是一个不同的文件,其名称与输入文件类似。在输入文件与输出文件不匹配的情况下,我该怎么办?我不确定是否应将所有拆分文件的循环放在run_igblast规则的输入或运行部分中。我希望以下列方式在run_igblast中执行命令:
perl vpairhumanalysis-igblast-isotype_BDmod.pl A_r1_q20p50_filtered.fasta_aa A_r2_q20p50_filtered.fasta_aa human_barcodes.txt