如何通过lambda函数传播snakemake通配符

时间:2018-11-15 22:37:16

标签: python-3.x snakemake

我正在尝试使用snakemake将vcf文件合并在一起,但是会收到错误消息:

Building DAG of jobs...
MissingInputException in line 21 of 

Missing input files for rule all:
outputs/****.g.vcf.gz

目标是在单独的染色体上调用变体,然后将它们融合在一起。似乎示例通配符没有通过我的lambda函数传播。我尝试了几次不同的迭代,但无法破解。我确定其余代码是可以的,因为当我删除合并功能并仅在所有染色体上调用变体时,文件运行正常。

任何帮助将不胜感激。

import glob

configfile: "config.json"

chroms = [1, 2, 3, 4, 5]
str_chroms = ["chr{}".format(chr) for chr in chroms]


def get_fq1(wildcards):
    # code that returns a list of fastq files for read 1 based on
    # *wildcards.sample* e.g.
    return sorted(glob.glob(wildcards.sample + '*_R1_001.fastq.gz'))


def get_fq2(wildcards):
    # code that returns a list of fastq files for read 2 based
    # on *wildcards.sample*, e.g.
    return sorted(glob.glob(wildcards.sample + '*_R2_001.fastq.gz'))


rule all:
    input:
        "outputs/" + config['sample'] + "_picard_alignment_metrics_output.txt",
        "outputs/" + config['sample'] + "_fastqc",
        "outputs/" + config['sample'] + "_analyze_covariates.pdf",
        "outputs/" + config['sample'] + ".g.vcf.gz",
        "outputs/" + config['sample'] + ".coverage"

rule bwa_map:
    input:
        config['reference_file'],
        get_fq1,
        get_fq2
    output:
        "outputs/{sample}_sorted.bam"
    shell:
        "bwa mem -t 16 {input} |  samtools view -bS - | \
        samtools sort -@ 16 -m 7G - -o {output}"

#a bunch of intermediate steps that are not the issue

rule variant_calling:
    input:
        bam = "outputs/{sample}_recal_reads.bam",
        bai = "outputs/{sample}_recal_reads.bam.bai",
        reference_file = config['reference_file']
    output:
       "outputs/{sample}_{chr}.g.vcf.gz"
    shell:
        """gatk --java-options "-Xmx128g" HaplotypeCaller \
       -R {reference_file} -I {input.bam} -L {wildcards.chr}\
       -O {output} -ERC GVCF"""

rule merge_vcfs:
    input:
        lambda wildcards: expand("outputs/{sample}_{chr}.g.vcf.gz",
                             chr=str_chroms, 
                      sample=wildcards.sample)
     output:
         "output/{sample}.g.vcf.gz"
     shell:
        "vcf-merge {input} | bgzip -c > {output}"

2 个答案:

答案 0 :(得分:3)

outputrule merge_vcfs中的错字。应该是outputs/{sample}.g.vcf.gz(即outputs而不是output

答案 1 :(得分:2)

是的,就像@JeeYem提到的那样,您在输出文件中有一个拼写错误,用于规则合并。

我还看不到规则合并中需要lambda吗?您是否通过了同一套染色体,而与样本无关? str_chroms与设置中的示例无关,因此您可以将其重写为:

rule merge_vcfs:
    input: expand("outputs/{{sample}}_{chr}.g.vcf.gz",chr=str_chroms)
    output: "output/{sample}.g.vcf.gz"
    shell: "vcf-merge {input} | bgzip -c > {output}"