在2个或更多车道的bwa对齐之后,snakemake合并bam文件

时间:2017-08-01 16:20:42

标签: snakemake

我尝试将snakemake用于地图并合并从许多车道获得的一些数据。 我有一些问题。我想做的是:

  。

* GZ> 432_L001.sam,432_L002.sam> 432_L001.sorted.bam,432_L002.sorted.bam> 432.bam

因此从fastq from units开始创建一个带有样本密钥名称的unic bamfile。

config.yaml

samples:
    "432": ["432_L001", "432_L002"]
    "433": ["433_LOO1","433_L002"]


units:

  "432_L001": [ "RAW/432_CGATGT_L001_R1_001.fastq.gz", "RAW/432_CGATGT_L001_R2_001.fastq.gz"]
  "432_L002": ["RAW/432_CGATGT_L002_R1_001.fastq.gz","RAW/432_CGATGT_L002_R2_001.fastq.gz"]
  "433_L001": ["RAW/433_CAGATC_L001_R1_001.fastq.gz","RAW/433_CAGATC_L001_R2_001.fastq.gz"]
  "433_L002": ["RAW/433_CAGATC_L002_R1_001.fastq.gz","RAW/433_CAGATC_L002_R2_001.fastq.gz"]

snakemake

rule all:
    input: expand("mapped_reads/merged_samples/{A}.bam", A=config["samples"]),
           expand("mapped_reads/bam/{unit}_sorted.bam",unit=config['units'])


include_prefix="rules"


include:
    include_prefix + "/bwa_mem.rules"
include:
    include_prefix + "/samfiles.rules"
include:
    include_prefix + "/picard.rules"

规则

    from snakemake.exceptions import MissingInputException

    rule bwa_mem:
        input:
            lambda wildcards: config["units"][wildcards.unit]
        output:
            temp("mapped_reads/sam/{unit}.sam")
        params:
            #sample=lambda wildcards, UNIT_TO_SAMPLE[wildcards.unit]
            #sample=lambda wildcards: units[wildcards.unit],
            genome= config["reference"]['genome_fasta']
        log:
            "mapped_reads/log/{unit}_bwa_mem.log"
        benchmark:
            "benchmarks/bwa/mem/{unit}.txt"
        threads: 8
        shell:
            '/illumina/software/PROG2/bwa-0.7.15/bwa mem '\
                    '-t {threads} {params.genome} {input} 2> {log} > {output}'
rule picard_SortSam:
   input:
       "mapped_reads/sam/{unit}.sam"
   output:
       temp("mapped_reads/bam/{unit}_sorted.bam")
   benchmark:
       "benchmarks/picard/SortSam/{unit}.txt"
   shell:
       "picard  SortSam I={input} O={output} SO=coordinate"

rule samtools_merge_bam:
    """
    Merge bam files for multiple units into one for the given sample.
    If the sample has only one unit, files will be copied.
    """
    input:
        lambda wildcards: expand("mapped_reads/bam/{unit}_sorted.bam",unit=config["samples"][wildcards.sample])
    output:
        "mapped_reads/merged_samples/{sample}.bam"
    benchmark:
        "benchmarks/samtools/merge/{sample}.txt"
    run:
        if len(input) > 1:
            shell("samtools merge {output} {input}")
        else:
            shell("cp {input} {output} && touch -h {output}")

如果我使用此代码,则始终存在此错误:

InputFunctionException in line 50 of /home/maurizio/Desktop/TEST_exome/rules/bwa_mem.rules:
KeyError: '433_LOO1'
Wildcards:
unit=433_LOO1

怎么解决?

这个通配符有什么问题.. ??:

  

lambda通配符:   扩展(" mapped_reads / BAM / {单元} _sorted.bam",单位=配置["样品"] [wildcards.sample])

1 个答案:

答案 0 :(得分:1)

你似乎在" 0"之间感到困惑。和" O"在你的配置中:

samples: "432": ["432_L001", "432_L002"] "433": ["433_LOO1","433_L002"] ----------------------------- ^^

"433_L001": ["RAW/433_CAGATC_L001_R1_001.fastq.gz","RAW... ----------- ^^