InputFunctionException-通配符不能由扩展和输入函数解析

时间:2020-05-26 17:01:00

标签: snakemake

我正在构建一个snakemake管道,该管道为每个样本下载一组(3)成对的fastq文件,将适当的读取连接在一起,然后对齐并调用变体。

我正在config.yaml文件中访问每个样本的运行登录和ftp路径(尽管我在文本文件中也有)。代码的最初的麻烦部分如下:

configfile:"config.yaml"
import pandas as pd
sampleseq = pd.read_csv("data/sample_seq_headers.csv")
ox_codes = sampleseq.ox_code

rule all:
    input:
        expand("data/variants/{ox_code}/results/variants/variants.vcf.gz", ox_code=ox_codes)


ruleorder: download_fastqs > cat_fastqs
wildcard_constraints: ERR="ERR\d+"

rule download_fastqs:
    output:
        "data/reads/{ox_code}/{ERR}_{n}.fastq.gz"
    log:
        "logs/download_ENA/{ox_code}_{ERR}_{n}.log"
    params:
        ftp=lambda wildcards:config['eachrun'][wildcards.ox_code][wildcards.ERR]['ftp_path']
    shell:
        """
        curl {params.ftp}{wildcards.n}.fastq.gz -s -S --retry 10 --retry-delay 10 > data/reads/{wildcards.ox_code}/{wildcards.ERR}_{wildcards.n}.fastq.gz.tmp 2> {log} \
        && mv data/reads/{wildcards.ox_code}/{wildcards.ERR}_{wildcards.n}.fastq.gz.tmp {output} 2> {log}
        """

rule cat_fastqs:
    input:
        expand("data/reads/{{ox_code}}/{ERR}_{{n}}.fastq.gz", ERR=lambda wildcards: config['allruns'][wildcards.ox_code]['ERR'])
    output:
        "data/reads/{ox_code}/merged_{ox_code}_{n}.fastq.gz"
    log:
        "logs/cat_fastqs/{ox_code}_{n}.log"
    shell:
        """
        zcat {input} > {output}
        """

配置文件分为两部分-'allruns'和'eachrun'-并嵌套如下(每个ERR运行3个条目):

allruns:
  WA-0001:
    country:
    - Ghana
    - Ghana
    - Ghana
    location:
    - XXXX
    - XXXX
    - XXXX
    ERS:
    - ERSXXXXXXX
    - ERSXXXXXXX
    - ERSXXXXXXX
    ERR:
    - ERR1234567
    - ERRXXXXXXX
    - ERRXXXXXXX
    ftp_path:
    - ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR333/002/ERR1234567/ERR1234567_
    - ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERXXXX...........
    - ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERXXXX...........
 WA-XXXX
eachrun:
  WA-0001:
    ERR1234567:
      country: Ghana
      location: XXXXX
      ERS: ERSXXXXXX
      ftp_path: ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR333/002/ERR1234567/ERR1234567_
    ERR2XXXXX:

但是,我收到一个InputFunctionException错误,因为cat_fastqs规则中的lambda函数似乎失败。那个或出于任何原因,snakemake无法适当地解析ERR通配符。香港专业教育学院修改输入和输出文件名,因为这可以帮助,但在这种情况下似乎没有帮助。

我要么收到此错误...

InputFunctionException in line 43 of /home/sanj/projects/XXXX/Snakefile:
KeyError: 'WA-0075'
Wildcards:
ox_code=WA-0075
ERR=WA-0075
n=1

或使用通配符约束(我不是100%确定是正确的)。

MissingInputException in line 56 of /home/sanj/projects/XXXX/Snakefile:
Missing input files for rule cat_fastqs:
data/reads/WA-0073/ function lambda at 0x7f4b8f0f7170_1 .fastq.gz

如果有人有任何想法,我将非常感激。

0 个答案:

没有答案