我正在尝试构建一条用于生物合成基因混乱检测的snakemake管道,但正在为错误而苦苦挣扎:
Missing input files for rule all:
antismash-output/Unmap_09/Unmap_09.txt
antismash-output/Unmap_12/Unmap_12.txt
antismash-output/Unmap_18/Unmap_18.txt
以此类推,包含更多文件。据我所知,snakefile中的文件生成应该可以正常工作:
workdir: config["path_to_files"]
wildcard_constraints:
separator = config["separator"],
extension = config["file_extension"],
sample = config["samples"]
rule all:
input:
expand("antismash-output/{sample}/{sample}.txt", sample = config["samples"])
# merging the paired end reads (either fasta or fastq) as prodigal only takes single end reads
rule pear:
input:
forward = "{sample}{separator}1.{extension}",
reverse = "{sample}{separator}2.{extension}"
output:
"merged_reads/{sample}.{extension}"
conda:
"~/miniconda3/envs/antismash"
shell:
"pear -f {input.forward} -r {input.reverse} -o {output} -t 21"
# If single end then move them to merged_reads directory
rule move:
input:
"{sample}.{extension}"
output:
"merged_reads/{sample}.{extension}"
shell:
"cp {path}/{sample}.{extension} {path}/merged_reads/"
# Setting the rule order on the 2 above rules which should be treated equally and only one run.
ruleorder: pear > move
# annotating the metagenome with prodigal#. Can be done inside antiSMASH but prefer to do it out
rule prodigal:
input:
"merged_reads/{sample}.{extension}"
output:
gbk_files = "annotated_reads/{sample}.gbk",
protein_files = "protein_reads/{sample}.faa"
conda:
"~/miniconda3/envs/antismash"
shell:
"prodigal -i {input} -o {output.gbk_files} -a {output.protein_files} -p meta"
# running antiSMASH on the annotated metagenome
rule antiSMASH:
input:
"annotated_reads/{sample}.gbk"
output:
touch("antismash-output/{sample}/{sample}.txt")
conda:
"~/miniconda3/envs/antismash"
shell:
"antismash --knownclusterblast --subclusterblast --full-hmmer --smcog --outputfolder antismash-output/{wildcards.sample}/ {input}"
这是我的config.yaml文件的示例:
file_extension: fastq
path_to_files: /home/lamma/ABR/Each_reads
samples:
- Unmap_14
- Unmap_55
- Unmap_37
separator: _
我看不到我在snakefile中出错的地方以产生这样的错误。对于这个简单的问题,我深表歉意。
答案 0 :(得分:2)
问题是您将全局通配符约束设置错误:
wildcard_constraints:
separator = config["separator"],
extension = config["file_extension"],
sample = '|'.join(config["samples"]) # <-- this should fix the problem
然后立即出现extension
和seperator
通配符的另一个问题。 Snakemake只能从其他文件名推断出它们应该是什么,您实际上不能通过通配符约束来设置它们。我们可以使用f-string
语法来填充值:
rule pear:
input:
forward = f"{{sample}}{config['separator']}1.{{extension}}",
reverse = f"{{sample}}{config['separator']}2.{{extension}}"
...
和:
rule prodigal:
input:
f"merged_reads/{{sample}}.{config['file_extension']}"
...
如果通配符约束使您困惑,请查看snakemake regex,如果您对f""
语法以及何时使用单个{
感到困惑,请查找有关f字符串的博客。以及何时使用双{{
来逃脱它们。
希望有帮助!
答案 1 :(得分:0)
(由于我无法发表评论...) 您的相对路径可能有问题,并且我们看不到文件的实际位置。
一种调试方法是使用config["path_to_files"]
在input:
中创建绝对路径
那会给您更好的错误消息,说明Snakemake希望将文件放在何处-输入/输出文件是相对于工作目录的。