我打算将生物信息学管道转移到snakemake,因为我目前的管道是多个脚本的集合,越来越难以遵循。在教程和文档的基础上,snakemake似乎是管道管理的非常明确和有趣的选择。但是,我不熟悉Python,因为我主要使用bash和R,所以snakemake似乎难以学习:我正面临以下问题。
我有两个文件,sampleA_L001_R1_001.fastq.gz和sampleA_L001_R2_001.fastq.gz,wchich放在同一个目录sampleA中。我想使用cat
命令合并这些文件。这实际上是一个测试运行:在实际情况下,每个样本我将有八个单独的FASTQ文件,应该以类似的方式合并。非常简单的工作,但我的代码有问题。
snakemake --latency-wait 20 --snakefile /home/users/me/bin/snakefile.txt
rule mergeFastq:
input:
reads1='sampleA/sampleA_L001_R1_001.fastq.gz',
reads2='sampleA/sampleA_L001_R2_001.fastq.gz'
output:
reads1='sampleA/sampleA_R1.fastq.gz',
reads2='sampleA/sampleA_R2.fastq.gz'
message:
'Merging FASTQ files...'
shell:
'cat {input.reads1} > {output.reads1}'
'cat {input.reads2} > {output.reads2}'
-------------------------------------------------------------
Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 mergeFastq
1
Job 0: Merging FASTQ files...
Waiting at most 20 seconds for missing files.
Error in job mergeFastq while creating output files sampleA_R1.fastq.gz, sampleA_R2.fastq.gz.
MissingOutputException in line 5 of /home/users/me/bin/snakefile.txt:
Missing files after 20 seconds:
sampleA_R1.fastq.gz
This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait.
Removing output files of failed job mergeFastq since they might be corrupted: sampleA_R2.fastq.gz
Will exit after finishing currently running jobs.
Exiting because a job execution failed. Look above for error message.
正如您所看到的,我已经尝试了--latency-wait
选项但没有成功。你有什么想法可能是我的问题的根源吗?文件路径是正确的,文件本身没有损坏,没问题。我也遇到了类似通配符的问题,所以在snakemake基础知识中肯定有一些我不理解的东西。
答案 0 :(得分:2)
问题出在shell语句中,它连接成一个命令,生成一个文件" sampleA / sampleA_R1.fastq.gzcat"这就是为什么snakemake找不到的正确的输出。您可以使用以下语法:
rule mergeFastq:
input:
reads1='sampleA/sampleA_L001_R1_001.fastq.gz',
reads2='sampleA/sampleA_L001_R2_001.fastq.gz'
output:
reads1='sampleA/sampleA_R1.fastq.gz',
reads2='sampleA/sampleA_R2.fastq.gz'
message:
'Merging FASTQ files...'
shell:"""
cat {input.reads1} > {output.reads1}
cat {input.reads2} > {output.reads2}
"""
不需要延迟等待选项。