将snakemake用作具有不同名称的规则的输入2输出
我正在制作一条蛇形管道,我使用strelka来比较肿瘤和正常样本。在这种情况下,我想比较Find Similar
的第一个元素与肿瘤的第一个元素GERMLINE = ("PT1", "S6", "S1”)
管道适用于初始规则:文件夹,strelkaconfig和strelkarun。问题在于对strelka输出的后处理,因为我想对两个输出做同样的事情:
somatic.snvs.vcf
somatic.indels.vcf
然而,我不知道如何让snakemake明白它应该对两者都做同样的事情而不重复规则。我试图做以下事情:
TUMOR = ("T5", "T7", "T20")
但是当我跑干时,我得到了这个错误:
GERMLINE = ("PT1", "S6", "S1")
TUMOR = ("T5", "T7", "T20")
ANALYSIS = "OUTPUT_PATH"
TYPEVAR = ["snvs","indels"]
INDGATK = "ref"
rule all:
input:
[ANALYSIS +"/{}_vs_{}/Stelka/results/variants/somatic.snvs.vcf".format(sample_g, sample_t)
for (sample_g, sample_t) in zip(GERMLINE, TUMOR)],
[ANALYSIS +"/{}_vs_{}/Stelka/runWorkflow.py".format(sample_g, sample_t)
for (sample_g, sample_t) in zip(GERMLINE, TUMOR)],
[ANALYSIS +"/{}_vs_{}/Stelka/results/variants/somatic.{}_Filtered".format(sample_g, sample_t,typevar)
for (sample_g, sample_t,typevar)
in zip(GERMLINE*len(TUMOR), TUMOR*len(TUMOR),sorted(TYPEVAR*len(TUMOR)))]
# Make folders
rule folders:
input:
g = "{samples_g}.bam",
t = "{samples_t}.bam"
output:
gen = "/{samples_g}_vs_{samples_t}",
strelka = "/{samples_g}_vs_{samples_t}/Stelka/"
run:
'''mkdir {output.gen}
mkdir {output.strelka}'''
# Strelka configuration
rule strelkaconfig:
input:
g = "{samples_g}.bam",
t = "{samples_t}.bam",
out_dir = ANALYSIS + "/{samples_g}_vs_{samples_t}/Stelka/"
output:
wfs = ANALYSIS + "/{samples_g}_vs_{samples_t}/Stelka/runWorkflow.py"
params:
ref = INDGATK
shell:
"python configureStrelkaSomaticWorkflow.py --normalBam {input.g} --tumorBam {input.t} --referenceFasta {params.ref} --runDir {input.out_dir} "
# Strelka run
rule strelkarun:
input:
wfs = ANALYSIS + "/{samples_g}_vs_{samples_t}/Stelka/runWorkflow.py"
output:
outsnvs = ANALYSIS + "/{samples_g}_vs_{samples_t}/Stelka/results/variants/somatic.snvs.vcf",
outindels = ANALYSIS + "/{samples_g}_vs_{samples_t}/Stelka/results/variants/somatic.indels.vcf"
shell:
"python {input.wfs}"
# POSTPROCESSING
rule vcfp:
input: ANALYSIS + "/{samples_g}_vs_{samples_t}/Stelka/results/variants/somatic.{typevar}.vcf"
output: ANALYSIS + "/{samples_g}_vs_{samples_t}/Stelka/results/variants/somatic.{typevar}_Filtered.vcf"
shell:
"java -jar StrelkaVCFParser -v {input} "
答案 0 :(得分:0)
管道似乎推断通配符不是你想象的那样。
您可以尝试使用wildcard constraints,如下所示:
wildcard_constraints:
samples_g = "|".join(GERMLINE)
samples_t = "|".join(TUMOR)
这可能无法解决您的问题,但all
规则的第三个输入对我来说并不是很干净。您可以使用两个连续的expand
来实现相同的目标,第一个使用zip
,如下所示:
expand(expand(
ANALYSIS + "/{sample_g}_vs_{sample_t}/Stelka/results/variants/somatic.{{typevar}}_Filtered",
zip, sample_g=GERMLINE, sample_t=TUMOR), typevar=TYPEVAR)
请注意typevar
周围的双花括号,以便第一次展开时不会展开。
您可以在执行from snakemake.io import expand
后在python3解释器中对此进行测试。
我个人觉得如果更容易理解。