我正在构建一个snakemake管道,该管道为每个样本下载一组(3)成对的fastq文件,将适当的读取连接在一起,然后对齐并调用变体。
我正在config.yaml文件中访问每个样本的运行登录和ftp路径(尽管我在文本文件中也有)。代码的最初的麻烦部分如下:
configfile:"config.yaml"
import pandas as pd
sampleseq = pd.read_csv("data/sample_seq_headers.csv")
ox_codes = sampleseq.ox_code
rule all:
input:
expand("data/variants/{ox_code}/results/variants/variants.vcf.gz", ox_code=ox_codes)
ruleorder: download_fastqs > cat_fastqs
wildcard_constraints: ERR="ERR\d+"
rule download_fastqs:
output:
"data/reads/{ox_code}/{ERR}_{n}.fastq.gz"
log:
"logs/download_ENA/{ox_code}_{ERR}_{n}.log"
params:
ftp=lambda wildcards:config['eachrun'][wildcards.ox_code][wildcards.ERR]['ftp_path']
shell:
"""
curl {params.ftp}{wildcards.n}.fastq.gz -s -S --retry 10 --retry-delay 10 > data/reads/{wildcards.ox_code}/{wildcards.ERR}_{wildcards.n}.fastq.gz.tmp 2> {log} \
&& mv data/reads/{wildcards.ox_code}/{wildcards.ERR}_{wildcards.n}.fastq.gz.tmp {output} 2> {log}
"""
rule cat_fastqs:
input:
expand("data/reads/{{ox_code}}/{ERR}_{{n}}.fastq.gz", ERR=lambda wildcards: config['allruns'][wildcards.ox_code]['ERR'])
output:
"data/reads/{ox_code}/merged_{ox_code}_{n}.fastq.gz"
log:
"logs/cat_fastqs/{ox_code}_{n}.log"
shell:
"""
zcat {input} > {output}
"""
配置文件分为两部分-'allruns'和'eachrun'-并嵌套如下(每个ERR运行3个条目):
allruns:
WA-0001:
country:
- Ghana
- Ghana
- Ghana
location:
- XXXX
- XXXX
- XXXX
ERS:
- ERSXXXXXXX
- ERSXXXXXXX
- ERSXXXXXXX
ERR:
- ERR1234567
- ERRXXXXXXX
- ERRXXXXXXX
ftp_path:
- ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR333/002/ERR1234567/ERR1234567_
- ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERXXXX...........
- ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERXXXX...........
WA-XXXX
eachrun:
WA-0001:
ERR1234567:
country: Ghana
location: XXXXX
ERS: ERSXXXXXX
ftp_path: ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR333/002/ERR1234567/ERR1234567_
ERR2XXXXX:
但是,我收到一个InputFunctionException错误,因为cat_fastqs规则中的lambda函数似乎失败。那个或出于任何原因,snakemake无法适当地解析ERR通配符。香港专业教育学院修改输入和输出文件名,因为这可以帮助,但在这种情况下似乎没有帮助。
我要么收到此错误...
InputFunctionException in line 43 of /home/sanj/projects/XXXX/Snakefile:
KeyError: 'WA-0075'
Wildcards:
ox_code=WA-0075
ERR=WA-0075
n=1
或使用通配符约束(我不是100%确定是正确的)。
MissingInputException in line 56 of /home/sanj/projects/XXXX/Snakefile:
Missing input files for rule cat_fastqs:
data/reads/WA-0073/ function lambda at 0x7f4b8f0f7170_1 .fastq.gz
如果有人有任何想法,我将非常感激。