我是snakemake的新手,并且想使用以下规则:
input_path = config["PATH"]
samples = pd.read_csv(config["METAFILE"], sep = '\t', header = 0)['sample']
rule getPaired:
output:
fwd = temp(tmp_path + "/reads/{sample}_fwd.fastq.gz"),
rev = temp(tmp_path + "/reads/{sample}_rev.fastq.gz")
params:
input_path = input_path
run:
shell("scp -i {params.input_path}/{wildcards.sample}_*1*.f*q.gz {output.fwd}"),
shell("scp -i {params.input_path}/{wildcards.sample}_*2*.f*q.gz {output.rev}")
输入文件具有不同的模式:
getPaired规则适用于 {sample} _ [1-2] .fq.gz 之类的输入,但不适用于第二种模式。
我在做什么错了?
答案 0 :(得分:2)
您应该使用input functions。我举了一个并非您真正需要的示例,但我认为它应该清楚地表明您想要实现的目标:
paths = {'sample1': '/home/jankees/data',
'sample2': '/mnt/data',
'sample3': '/home/christina/fastq'}
extensions = {'sample1': '.fq.gz',
'sample2': '.fq.gz',
'sample3': '.fastq.gz'}
def get_input(wildcards):
input_file = paths[wildcards.sample] + "/read/" + wildcards.sample + extensions[wildcards.sample]
return input_file
rule all:
input:
["sample1_trimmed.fastq.gz",
"sample2_trimmed.fastq.gz",
"sample3_trimmed.fastq.gz"]
rule trim:
input:
get_input
output:
"{sample}_trimmed.fastq.gz"
shell:
"touch {output}"