我有一个包含多个子文件夹的文件夹,每个子文件夹都包含我想要与基因组对齐的.fastq文件。我正在尝试为它创建一个snakemake工作流程。首先,我使用通配符访问每个子目录及其中的文件。然后我使用expand函数存储文件的所有路径,并编写规则将文件映射到基因组。代码如下:
from snakemake.io import glob_wildcards, expand
import sys
import os
directories, files = glob_wildcards("data/samples/{dir}/{file}.fastq")
print(directories, files)
rule all:
input:
expand("data/samples/{dir}/{file}.fastq", zip, dir=directories,
file=files)
rule bwa_map:
input:
G = "data/genome.fa",
r1 = expand("data/samples/{dir}/{file}.fastq", zip,
dir=directories, file=files)
output:
r2 = expand("data/results/{dir}/{file}.bam", zip, dir=directories,
file=files)
shell:
"./bwa mem {input.G} {input.r1} | ./samtools sort -o - > {output.r2}"
但是,当我执行此代码为" snakemake bwa_map"时,我收到以下错误:
Error in job bwa_map while creating output files data/results/SRR5923/A.bam, data/results/SRR5924/B.bam, data/results/SRR5925/C.bam.
RuleException:
CalledProcessError in line 19 of /Users/rewatitappu/PycharmProjects/RNA-seq_Snakemake/Snakefile:
Command './bwa mem data/genome.fa data/samples/SRR5923/A.fastq data/samples/SRR5924/B.fastq data/samples/SRR5925/C.fastq | ./samtools sort -o - > data/results/SRR5923/A.bam data/results/SRR5924/B.bam data/results/SRR5925/C.bam' returned non-zero exit status 1.
File "/Users/rewatitappu/PycharmProjects/RNA-seq_Snakemake/Snakefile", line 19, in __rule_bwa_map
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/concurrent/futures/thread.py", line 55, in run
Removing output files of failed job bwa_map since they might be corrupted:
data/results/SRR5923/A.bam
Will exit after finishing currently running jobs.
我是否错误地执行了snakemake命令或者代码是否有问题?
答案 0 :(得分:0)
错误消息表明执行以下shell命令时发生错误:
./bwa mem data/genome.fa data/samples/SRR5923/A.fastq data/samples/SRR5924/B.fastq data/samples/SRR5925/C.fastq | ./samtools sort -o - > data/results/SRR5923/A.bam data/results/SRR5924/B.bam data/results/SRR5925/C.bam
问题可能是因为您有两个bam文件作为输出。
您可能不应在expand
规则中使用bwa_map
。展开已在all
规则中进行。