我要解决的总体问题是一种方法,它可以计算我正在构建的QC管道的每个步骤中每个文件中存在的读取次数。我有一个过去使用过的Shell脚本,它包含一个目录并输出每个文件的读取次数。由于我希望使用目录作为输入,因此我尝试遵循Rasmus在本文中列出的格式:
https://bitbucket.org/snakemake/snakemake/issues/961/rule-with-folder-as-input-and-output
以下是在管道中较早创建的一些示例输入:
$ ls -1 cut_reads/
97_R1_cut.fastq.gz
97_R2_cut.fastq.gz
98_R1_cut.fastq.gz
98_R2_cut.fastq.gz
99_R1_cut.fastq.gz
99_R2_cut.fastq.gz
还有一个简化的Snakefile,它首先通过在新目录中创建符号链接来聚合所有读取,然后将该目录用作读取计数shell脚本的输入:
import os
configfile: "config.yaml"
rule all:
input:
"read_counts/read_counts.txt"
rule agg_count:
input:
cut_reads = expand("cut_reads/{sample}_{rdir}_cut.fastq.gz", rdir=["R1", "R2"], sample=config["forward_reads"])
output:
cut_dir = directory("read_counts/cut_reads")
run:
os.makedir(output.cut_dir)
for read in input.cut_reads:
abspath = os.path.abspath(read)
shell("ln -s {abspath} {output.cut_dir}")
rule count_reads:
input:
cut_reads = "read_counts/cut_reads"
output:
"read_counts/read_counts.txt"
shell:
'''
readcounts.sh {input.cut_reads} >> {output}
'''
在空运行中一切都很好,但是当我尝试实际执行它时,我得到了一个相当隐秘的错误消息:
Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 agg_count
1 all
1 count_reads
3
[Tue Jun 18 11:31:22 2019]
rule agg_count:
input: cut_reads/99_R1_cut.fastq.gz, cut_reads/98_R1_cut.fastq.gz, cut_reads/97_R1_cut.fastq.gz, cut_reads/99_R2_cut.fastq.gz, cut_reads/98_R2_cut.fastq.gz, cut_reads/97_R2_cut.fastq.gz
output: read_counts/cut_reads
jobid: 2
Job counts:
count jobs
1 agg_count
1
[Tue Jun 18 11:31:22 2019]
Error in rule agg_count:
jobid: 0
output: read_counts/cut_reads
Exiting because a job execution failed. Look above for error message
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /home/douglas/snakemake/scrap_directory/.snakemake/log/2019-06-18T113122.202962.snakemake.log
read_counts/
已创建,但其中没有cut_reads/
目录。完整的日志中没有其他错误消息。任何人都知道出了什么问题或如何收到更具描述性的错误消息?
(显然)我对蛇制作还很陌生,因此可能会有更好的方法来完成整个过程。任何帮助深表感谢!
答案 0 :(得分:1)
...这是一个错字。典型。 SelectedSuitStatus
应该是os.makedir(output.cut_dir)
。我仍然非常好奇为什么当您尝试运行此命令时,snakemake不会显示AttributeError python抛出:
os.makedirs(output.cut_dir)
是否在某个地方存储或可以访问该文件以防止将来出现头痛?
答案 1 :(得分:1)
您确定错误消息是由于os.makedir
中的错字引起的吗?在此测试脚本中,os.makedir
确实抛出了AttributeError ...
:
rule all:
input:
'tmp.done',
rule one:
output:
x= 'tmp.done',
xdir= directory('tmp'),
run:
os.makedir(output.xdir)
执行时:
Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 all
1 one
2
[Wed Jun 19 09:05:57 2019]
rule one:
output: tmp.done, tmp
jobid: 1
Job counts:
count jobs
1 one
1
[Wed Jun 19 09:05:57 2019]
Error in rule one:
jobid: 0
output: tmp.done, tmp
RuleException:
AttributeError in line 10 of /home/dario/Tritume/Snakefile:
module 'os' has no attribute 'makedir'
File "/home/dario/Tritume/Snakefile", line 10, in __rule_one
File "/home/dario/miniconda3/envs/tritume/lib/python3.6/concurrent/futures/thread.py", line 56, in run
Exiting because a job execution failed. Look above for error message
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /home/dario/Tritume/.snakemake/log/2019-06-19T090557.113876.snakemake.log
答案 2 :(得分:0)
使用f字符串解析局部变量,例如{abspath}
:
for read in input.cut_reads:
abspath = os.path.abspath(read)
shell(f"ln -s {abspath} {output.cut_dir}")
将snakemake自动解析的通配符包装为f字符串内的双括号。