snakemake了解对齐命令的yaml解释

时间:2017-08-09 10:37:58

标签: snakemake

我对snakemake文件有这个规则。当我启动输入文件时,从我的yaml文件中的所有输入填充。我希望为每个bwa过程填充一个单位密钥。 这里有规则和Yaml文件(不完整)和干运行结果。

rule bwa_mem:
    input:
        dt=expand("trim/{sample}/",sample=config['units']),
        forward_paired=expand("trim/{sample}/{sample}_forward_paired.fq.gz",sample=config['units']),
        reverse_paired=expand("trim/{sample}/{sample}_reverse_paired.fq.gz",sample=config['units']),
        forward_unpaired=expand("trim/{sample}/{sample}_forward_unpaired.fq.gz",sample=config['units']),
        reverse_unpaired=expand("trim/{sample}/{sample}_reverse_unpaired.fq.gz",sample=config['units']),

    output:
        temp("mapped_reads/sam/{unit}.sam")
    params:
        genome= config["reference"]['genome_fasta']
    log:
        "mapped_reads/log/{unit}_bwa_mem.log"
    benchmark:
        "benchmarks/bwa/mem/{unit}.txt"
    threads: 8
    shell:
        '/illumina/software/PROG2/bwa-0.7.15/bwa mem '\
                '-t {threads} {params.genome}  {input.forward_paired} {input.reverse_paired} {input.forward_unpaired} {input.reverse_unpaired} 2> {log} > {output}'

这个yaml文件配置:

  'samples':
  '432':
  - '432_L001'
  - '432_L002'
  '433':
  - '433_L002'
  - '433_L001'
  '434':
  - '434_L001'
  - '434_L002'
  '435':
  - '435_L002'
  - '435_L001'
....
'units':
  '432_L001':
  - '/illumina/runs/FASTQ/RAW/432_CGATGT_L001_R1_001.fastq.gz'
  - '/illumina/runs/FASTQ/RAW/432_CGATGT_L001_R2_001.fastq.gz'
  '432_L002':
  - '/illumina/runs/FASTQ/RAW/432_CGATGT_L002_R1_001.fastq.gz'
  - '/illumina/runs/FASTQ/RAW/432_CGATGT_L002_R2_001.fastq.gz'
  '433_L001':
  - '/illumina/runs/FASTQ/RAW/433_CAGATC_L001_R1_001.fastq.gz'
  - '/illumina/runs/FASTQ/RAW/433_CAGATC_L001_R2_001.fastq.gz'
  '433_L002':
  - '/illumina/runs/FASTQ/RAW/433_CAGATC_L002_R1_001.fastq.gz'
  - '/illumina/runs/FASTQ/RAW/433_CAGATC_L002_R2_001.fastq.gz'
  '434_L001':
  - '/illumina/runs/FASTQ/RAW/434_GTGAAA_L001_R1_001.fastq.gz'
  - '/illumina/runs/FASTQ/RAW/434_GTGAAA_L001_R2_001.fastq.gz'
  '434_L002':
  - '/illumina/runs/FASTQ/RAW/434_GTGAAA_L002_R1_001.fastq.gz'
  - '/illumina/runs/FASTQ/RAW/434_GTGAAA_L002_R2_001.fastq.gz'
  '435_L001':
  - '/illumina/runs/FASTQ/RAW/435_ACAGTG_L001_R1_001.fastq.gz'
  - '/illumina/runs/FASTQ/RAW/435_ACAGTG_L001_R2_001.fastq.gz'

当我尝试跑步时,他的bwa命令给出了这个结果

rule bwa_mem:
    input: trim/432_L001/432_L001_reverse_unpaired.fq.gz, trim/432_L002/4
32_L002_reverse_unpaired.fq.gz, trim/433_L001/433_L001_reverse_unpaired.f
q.gz, trim/433_L002/433_L002_reverse_unpaired.fq.gz, trim/434_L001/434_L0
01_reverse_unpaired.fq.gz, trim/434_L002/434_L002_reverse_unpaired.fq.gz,
 trim/435_L001/435_L001_reverse_unpaired.fq.gz, trim/435_L002/435_L002_re
verse_unpaired.fq.gz, trim/436_L001/436_L001_reverse_unpaired.fq.gz, trim
/436_L002/436_L002_reverse_unpaired.fq.gz, trim/437_L001/437_L001_reverse
_unpaired.fq.gz, trim/437_L002/437_L002_reverse_unpaired.fq.gz, trim/438_
L003/438_L003_reverse_unpaired.fq.gz, trim/438_L004/438_L004_reverse_unpa
ired.fq.gz,  trim/lane1_L001/lane1_L
001_reverse_paired.fq.gz, trim/lane2_L002/lane2_L002_reverse_paired.fq.gz
, trim/lane8_L008/
    output: mapped_reads/sam/441_L004.sam
    log: mapped_reads/log/441_L004_bwa_mem.log
    jobid: 208
    benchmark: benchmarks/bwa/mem/441_L004.txt
    wildcards: unit=441_L004

对于单位上的任何元素报告所有输入文件...我犯了哪些错误?

1 个答案:

答案 0 :(得分:2)

因此,您在此处执行的操作是通过expand函数将所有这些文件定义为规则的输入文件。换句话说,您在此处执行聚合。你真正想要的是只有特定样本的输入文件集。您只需不使用输入文件的扩展功能即可实现此目的。这里没有理由使用它。

我强烈建议您阅读整个官方Snakemake教程,该教程也涵盖了这类问题:http://snakemake.readthedocs.io/en/stable/tutorial/tutorial.html