snakemake:具有不同模式的规则输入

时间:2020-02-21 08:37:04

标签: python bioinformatics snakemake

我是snakemake的新手,并且想使用以下规则:

input_path = config["PATH"]
samples = pd.read_csv(config["METAFILE"], sep = '\t', header = 0)['sample']

rule getPaired:
        output:
            fwd = temp(tmp_path + "/reads/{sample}_fwd.fastq.gz"),
            rev = temp(tmp_path + "/reads/{sample}_rev.fastq.gz")
        params:
            input_path = input_path
        run:
            shell("scp -i {params.input_path}/{wildcards.sample}_*1*.f*q.gz {output.fwd}"),
            shell("scp -i {params.input_path}/{wildcards.sample}_*2*.f*q.gz {output.rev}")

输入文件具有不同的模式:

  1. {sampleID} _R [1-2] _001.fq.gz (例如:2160_J15_S480_R1_001.fastq.gz)
  2. {sampleID} _ [1-2] .fq.gz (例如:SRX000001_1.fq.gz)

getPaired规则适用于 {sample} _ [1-2] .fq.gz 之类的输入,但不适用于第二种模式。

我在做什么错了?

1 个答案:

答案 0 :(得分:2)

您应该使用input functions。我举了一个并非您真正需要的示例,但我认为它应该清楚地表明您想要实现的目标:

paths = {'sample1': '/home/jankees/data',
         'sample2': '/mnt/data',
         'sample3': '/home/christina/fastq'}

extensions = {'sample1': '.fq.gz',
              'sample2': '.fq.gz',
              'sample3': '.fastq.gz'}

def get_input(wildcards):
    input_file = paths[wildcards.sample] + "/read/" + wildcards.sample + extensions[wildcards.sample]
    return input_file

rule all:
    input:
        ["sample1_trimmed.fastq.gz", 
         "sample2_trimmed.fastq.gz", 
         "sample3_trimmed.fastq.gz"]

rule trim:
    input:
        get_input
    output:
        "{sample}_trimmed.fastq.gz"
    shell:
        "touch {output}"