我目前遇到了一些问题,snakemake运行了检查点所需的中间规则。尝试解决此问题后,我相信问题出在aggregate_input
函数中的expand命令之内,但无法弄清楚其行为方式。
rule all:
input:
¦ expand("string_tie_assembly/{sample}.gtf", sample=sample),
¦ expand("combined_fasta/{sample}.fa", sample=sample),
¦ "aggregated_fasta/all_fastas_combined.fa"
checkpoint clustering:
input:
¦ "string_tie_assembly_merged/merged_{sample}.gtf"
output:
¦ clusters = directory("split_gtf_file/{sample}")
shell:
¦ """
¦ mkdir -p split_gtf_file/{wildcards.sample} ;
collapse_gtf_file.py -gtf {input} -o split_gtf_file/{wildcards.sample}/{wildcards.sample}
¦ """
rule gtf_to_fasta:
input:
¦ "split_gtf_file/{sample}/{sample}_{i}.gtf"
output:
¦ "lncRNA_fasta/{sample}/canidate_{sample}_{i}.fa"
shell:
¦ "gffread -w {output} -g {reference} {input}"
rule rename_fasta_files:
input:
¦ "lncRNA_fasta/{sample}/canidate_{sample}_{i}.fa"
output:
¦ "lncRNA_fasta_renamed/{sample}/{sample}_{i}.fa"
shell:
¦ "seqtk rename {input} {wildcards.sample}_{i} > {output}"
#Gather N number of output files from the GTF split
def aggregate_input(wildcards):
checkpoint_output = checkpoints.clustering.get(**wildcards).output[0]
x = expand("lncRNA_fasta_renamed/{sample}/{sample}_{i}.fa",
¦ sample=sample,
¦ i=glob_wildcards(os.path.join(checkpoint_output, "{i}.fa")).i)
print(x)
return x
#Aggregate fasta from split GTF files together
rule combine_fasta_file:
input:
¦ aggregate_input
output:
¦ "combined_fasta/{sample}.fa"
shell:
"cat {input} > {output}"
¦ aggregate_input
output:
¦ "combined_fasta/{sample}.fa"
shell:
¦ "cat {input} > {output}"
#Aggegate aggregated fasta files
def gather_files(wildcards):
files = expand("combined_fasta/{sample}.fa", sample=sample)
return(files)
rule aggregate_fasta_files:
input:
¦ gather_files
output:
¦ "aggregated_fasta/all_fastas_combined.fa"
shell:
¦ "cat {input} > {output}"
我一直遇到的问题是,在运行snakemake文件时,combine_fasta_file
规则不会运行。在花了更多的时间解决此错误之后,我意识到问题是aggregate_input
函数没有扩展,并返回了一个空列表[]
而不是我期望的空列表{目录已展开,即:lncRNA_fasta_renamed/{sample}/{sample}_{i}.fa
。
这很奇怪,尤其是考虑到checkpoint clustering
确实运行正确并且下游输出文件位于rule all
有人知道为什么会这样吗?或有可能是这种情况。
用于运行snakemake的命令:snakemake -rs Assemble_regions.snake --configfile snake_config_files / annotated_group_config.yaml
答案 0 :(得分:0)
只是弄清楚了。问题是我的aggregat
e命令定位了错误的文件。以前我把它写成
def aggregate_input(wildcards):
checkpoint_output = checkpoints.clustering.get(**wildcards).output[0]
x = expand("lncRNA_fasta_renamed/{sample}/{sample}_{i}.fa",
¦ sample=sample,
¦ i=glob_wildcards(os.path.join(checkpoint_output, "{i}.fa")).i)
print(x)
return x
此问题是针对的是错误的文件。它应该代替{i}.fa
产生的文件,而不是globbig checkpoint clustering
。因此,将此代码更改为
def aggregate_input(wildcards):
checkpoint_output = checkpoints.clustering.get(**wildcards).output[0]
print(checkpoint_output)
x = expand("lncRNA_fasta_renamed/{sample}/{sample}_{i}.fa",
¦ sample=wildcards.sample,
¦ i=glob_wildcards(os.path.join(checkpoint_output, "{sample}_{i}.gtf")).i)
print(x)
return x
解决了该问题。