我想使用配对样本(肿瘤和正常)进行gatk重新校准。我需要使用pandas解析数据。这就是我的想法。
expand("mapped_reads/merged_samples/{sample[1][tumor]}/{sample[1][tumor]}_{sample[1][normal]}.bam", sample=read_table(config["conditions"], ",").iterrows())
这是条件文件:
432,433
434,435
我写了这条规则:
rule gatk_RealignerTargetCreator:
input:
"mapped_reads/merged_samples/{tumor}.sorted.dup.reca.bam",
"mapped_reads/merged_samples/{normal}.sorted.dup.reca.bam",
output:
"mapped_reads/merged_samples/{tumor}/{tumor}_{normal}.realign.intervals"
params:
genome=config['reference']['genome_fasta'],
mills= config['mills'],
ph1_indels= config['know_phy'],
log:
"mapped_reads/merged_samples/logs/{tumor}_{normal}.realign_info.log"
threads: 8
shell:
"gatk -T RealignerTargetCreator -R {params.genome} {params.custom} "
"-nt {threads} "
"-I {wildcard.tumor} -I {wildcard.normal} -known {params.ph1_indels} "
"-o {output} >& {log}"
我有这个错误:
InputFunctionException in line 17 of /home/maurizio/Desktop/TEST_exome/rules/samfiles.rules:
KeyError: '432/432_433'
Wildcards:
sample=432/432_433
这是samfiles.rules:
rule samtools_merge_bam:
"""
Merge bam files for multiple units into one for the given sample.
If the sample has only one unit, files will be copied.
"""
input:
lambda wildcards: expand("mapped_reads/bam/{unit}_sorted.bam",unit=config["samples"][wildcards.sample])
output:
"mapped_reads/merged_samples/{sample}.bam"
benchmark:
"benchmarks/samtools/merge/{sample}.txt"
run:
if len(input) > 1:
shell("/illumina/software/PROG2/samtools-1.3.1/samtools merge {output} {input}")
else:
shell("cp {input} {output} && touch -h {output}")
答案 0 :(得分:1)
我只能猜测,因为您没有显示所有相关规则,但我会说错误发生是因为规则samtools_merge_bam
也适用于某些后来的bam文件,其中您具有模式{{1} } ...
作为解决方案,您必须解决这种歧义(请参阅snakemake教程)。例如,您可以将{tumor}/{tumor}_{normal}
的通配符限制为不包含任何斜杠。
samtools_merge_bam
您可以将约束全局或放在wildcard_constraints:
sample="[^/]+"
规则中。