Question

我是snakemake的新手，希望能够获取一对.fq个文件或一对.fq.gz个文件并通过trim_galore运行它们以获得一对修剪.fq.gz输出文件。如果没有提供我的所有Snakefile，我就会有一个下面丑陋的解决方案，我只是复制了规则并更改了输入。什么是更好的解决方案？

#Trim galore paired end trimming rule for unzipped fastqs:
rule trim_galore_unzipped_PE:
    input:
        r1=join(config['fq_in_path'], '{sample}1.fq'),
        r2=join(config['fq_in_path'], '{sample}2.fq'),
    output:
        r1=join(config['trim_out_path'], '{sample}1_val_1.fq.gz'),
        r2=join(config['trim_out_path'], '{sample}2_val_2.fq.gz'),
    params:
        out_path=config['trim_out_path'],
    conda:
        'envs/biotools.yaml',
    shell:
        'trim_galore --gzip -o {params.out_path} --paired {input.r1} {input.r2}'

#Trim galore paired end trimming rule for gzipped fastqs:
rule trim_galore_zipped_PE:
    input:
        r1=join(config['fq_in_path'], '{sample}1.fq.gz'),
        r2=join(config['fq_in_path'], '{sample}2.fq.gz'),
    output:
        r1=join(config['trim_out_path'], '{sample}1_val_1.fq.gz'),
        r2=join(config['trim_out_path'], '{sample}2_val_2.fq.gz'),
    params:
        out_path=config['trim_out_path'],
    conda:
        'envs/biotools.yaml',
    shell: 
        'trim_galore --gzip -o {params.out_path} --paired {input.r1} {input.r2}'

Answer 1

使用输入函数可能是最佳解决方案，如下所示：

将通配符传递给输入功能
使用已知的YAML值，使用该样本名称构建理论文件名。
使用python函数检查哪个文件（技术上的文件后缀）有效
构建有效文件列表
返回并解压缩有效文件列表。

备注：

输入和输出应该具有相同的通配符，如果它们不会导致问题

在输入函数中，确保它不能返回空字符串，因为Snakemake将其解释为＆＃34;缺少输入＆＃34;要求，这不是你想要的。

如果您采纳这些建议，请更新规则名称，我忘了。

<强> Snakefile：

configfile: "config.yaml" from os.path import join from os.path import exists rule all: input: expand("{trim_out_path}/{sample}.{readDirection}.fq.gz", trim_out_path=config["trim_out_path"], sample=config["sampleList"], readDirection=['1','2']) def trim_galore_input_determination(wildcards): potential_file_path_list = [] # Cycle through both suffix possibilities: for fastqSuffix in [".fq", ".fq.gz"]: # Cycle through both read directions for readDirection in ['.1','.2']: #Build the list for ech suffix potential_file_path = config["fq_in_path"] + "/" + wildcards.sample + readDirection + fastqSuffix #Check if this file actually exists if exists(potential_file_path): #If file is legit, add to list of acceptable files potential_file_path_list.append(potential_file_path) # Checking for an empty list if len(potential_file_path_list): return potential_file_path_list else: return ["trim_galore_input_determination_FAILURE" + wildcards.sample] rule trim_galore_unzipped_PE: input: unpack(trim_galore_input_determination) output: expand("{trim_out_path}/{{sample}}.{readDirection}.fq.gz", trim_out_path=config["trim_out_path"], readDirection=['1','2']) params: out_path=config['trim_out_path'], conda: 'envs/biotools.yaml', shell: 'trim_galore --gzip -o {params.out_path} --paired {input}'

<强> config.yaml：

fq_in_path: input/fq trim_out_path: output sampleList: ["mySample1", "mySample2"]

<强> $树

|-- [tboyarsk 1540 Sep 6 15:17] Snakefile |-- [tboyarsk 82 Sep 6 15:17] config.yaml |-- [tboyarsk 512 Sep 6 8:55] input | |-- [tboyarsk 512 Sep 6 8:33] fq | | |-- [tboyarsk 0 Sep 6 7:50] mySample1.1.fq | | |-- [tboyarsk 0 Sep 6 8:24] mySample1.2.fq | | |-- [tboyarsk 0 Sep 6 7:50] mySample2.1.fq | | `-- [tboyarsk 0 Sep 6 8:24] mySample2.2.fq | `-- [tboyarsk 512 Sep 6 8:55] fqgz | |-- [tboyarsk 0 Sep 6 7:50] mySample1.1.fq.gz | |-- [tboyarsk 0 Sep 6 8:32] mySample1.2.fq.gz | |-- [tboyarsk 0 Sep 6 8:33] mySample2.1.fq.gz | `-- [tboyarsk 0 Sep 6 8:32] mySample2.2.fq.gz `-- [tboyarsk 512 Sep 6 7:55] output

$ snakemake -dry（输入：fg）

rule trim_galore_unzipped_PE: input: input/fq/mySample1.1.fq, input/fq/mySample1.2.fq output: output/mySample1.1.fq.gz, output/mySample1.2.fq.gz jobid: 1 wildcards: sample=mySample1 rule trim_galore_unzipped_PE: input: input/fq/mySample2.1.fq, input/fq/mySample2.2.fq output: output/mySample2.1.fq.gz, output/mySample2.2.fq.gz jobid: 2 wildcards: sample=mySample2 localrule all: input: output/mySample1.1.fq.gz, output/mySample2.1.fq.gz, output/mySample1.2.fq.gz, output/ mySample2.2.fq.gz jobid: 0 Job counts: count jobs 1 all 2 trim_galore_unzipped_PE 3

$ snakemake -dry（输入：fgqz）

rule trim_galore_unzipped_PE: input: input/fqgz/mySample1.1.fq.gz, input/fqgz/mySample1.2.fq.gz output: output/mySample1.1.fq.gz, output/mySample1.2.fq.gz jobid: 1 wildcards: sample=mySample1 rule trim_galore_unzipped_PE: input: input/fqgz/mySample2.1.fq.gz, input/fqgz/mySample2.2.fq.gz output: output/mySample2.1.fq.gz, output/mySample2.2.fq.gz jobid: 2 wildcards: sample=mySample2 localrule all: input: output/mySample1.1.fq.gz, output/mySample1.2.fq.gz, output/mySample2.1.fq.gz, output/ mySample2.2.fq.gz jobid: 0 Job counts: count jobs 1 all 2 trim_galore_unzipped_PE 3

有一些方法可以使它更通用，但由于您声明并使用YAML配置来构建大部分文件名，我将避免在答案中讨论它。只是说它可能并且有点鼓励。

＆＃34; - 配对{input}＆＃34;将扩展以提供这两个文件。由于for循环，1将始终位于2之前。

接受与snakemake规则略有不同的输入（.fq vs .fq.gz）

1 个答案: