我正在使用Snakemake编写我的RNA-seq管道。当我写最后一部分rule fpkm
来从bam文件计算fpkm值时,出现错误:
MissingInputException in line 3 of /root/s/r/snakemake/my_rnaseq_data/Snakefile:
Missing input files for rule all:
05_ft/wt2_transcript.gtf
05_ft/wt1_transcript.gtf
05_ft/wt2_gene.gtf
05_ft/epcr1_gene.gtf
05_ft/wt1_gene.gtf
05_ft/epcr2_transcript.gtf
05_ft/epcr1_transcript.gtf
05_ft/epcr2_gene.gtf
这是我的Snakefile:
SBT=["wt1","wt2","epcr1","epcr2"]
rule all:
input:
expand("02_clean/{nico}_1.paired.fq", nico=SBT),
expand("02_clean/{nico}_2.paired.fq", nico=SBT),
expand("03_align/{nico}.bam", nico=SBT),
expand("04_exp/{nico}_count.txt", nico=SBT),
expand("05_ft/{nico}_gene.gtf", nico=SBT),
expand("05_ft/{nico}_transcript.gtf", nico=SBT)
rule trim:
input:
"01_raw/{nico}_1.fastq",
"01_raw/{nico}_2.fastq"
output:
"02_clean/{nico}_1.paired.fq.gz",
"02_clean/{nico}_1.unpaired.fq.gz",
"02_clean/{nico}_2.paired.fq.gz",
"02_clean/{nico}_2.unpaired.fq.gz",
shell:
"java -jar /software/Trimmomatic-0.36/trimmomatic-0.36.jar PE -threads 16 {input[0]} {input[1]} {output[0]} {output[1]} {output[2]} {output[3]} ILLUMINACLIP:/software/Trimmomatic-0.36/adapters/TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 &"
rule gzip:
input:
"02_clean/{nico}_1.paired.fq.gz",
"02_clean/{nico}_2.paired.fq.gz"
output:
"02_clean/{nico}_1.paired.fq",
"02_clean/{nico}_2.paired.fq"
run:
shell("gzip -d {input[0]} > {output[0]}")
shell("gzip -d {input[1]} > {output[1]}")
rule map:
input:
"02_clean/{nico}_1.paired.fq",
"02_clean/{nico}_2.paired.fq"
output:
"03_align/{nico}.sam"
log:
"logs/map/{nico}.log"
threads: 40
shell:
"hisat2 -p 20 --dta -x /root/s/r/p/A_th/WT-Al_VS_WT-CK/index/tair10 -1 {input[0]} -2 {input[1]} -S {output} >{log} 2>&1 &"
rule sort2bam:
input:
"03_align/{nico}.sam"
output:
"03_align/{nico}.bam"
threads:30
shell:
"samtools sort -@ 20 -m 20G -o {output} {input} "
rule count:
input:
"03_align/{nico}.bam"
output:
"04_exp/{nico}_count.txt"
shell:
"featureCounts -T 10 -p -t exon -g gene_id -a /root/s/r/p/A_th/WT-Al_VS_WT-CK/genome/tair10.gtf -o {output} {input}"
rule fpkm:
input:
"03_align/{nico}.bam"
output:
"05_ft/{nico}_gene.gtf"
"05_ft/{nico}_transcript.gtf"
shell:
"stringtie -e -p 30 -G /root/s/r/p/A_th/WT-Al_VS_WT-CK/index/tair10 -A {output[0]} -o {output[1]} {input}"
这是我的目录结构:
|-- 03_align
| |-- epcr1.bam
| |-- epcr1.sam
| |-- epcr2.bam
| |-- epcr2.sam
| |-- wt1.bam
| |-- wt1.sam
| |-- wt2.bam
| `-- wt2.sam
|-- 04_exp
在添加“ rule fpkm”部分之前,运行Snakefile时bam文件已经存在。
答案 0 :(得分:1)
错误是由于rule fpkm
中的输出文件之间没有逗号引起的。在没有逗号的情况下,python将其视为多行字符串,因此将它们连接起来并视为一个长字符串05_ft/{nico}_gene.gtf05_ft/{nico}_transcript.gtf
。
rule fpkm:
output:
"05_ft/{nico}_gene.gtf",
"05_ft/{nico}_transcript.gtf"