我是snakemake的新手,希望能够获取一对.fq
个文件或一对.fq.gz
个文件并通过trim_galore
运行它们以获得一对修剪.fq.gz
输出文件。如果没有提供我的所有Snakefile,我就会有一个下面丑陋的解决方案,我只是复制了规则并更改了输入。什么是更好的解决方案?
#Trim galore paired end trimming rule for unzipped fastqs:
rule trim_galore_unzipped_PE:
input:
r1=join(config['fq_in_path'], '{sample}1.fq'),
r2=join(config['fq_in_path'], '{sample}2.fq'),
output:
r1=join(config['trim_out_path'], '{sample}1_val_1.fq.gz'),
r2=join(config['trim_out_path'], '{sample}2_val_2.fq.gz'),
params:
out_path=config['trim_out_path'],
conda:
'envs/biotools.yaml',
shell:
'trim_galore --gzip -o {params.out_path} --paired {input.r1} {input.r2}'
#Trim galore paired end trimming rule for gzipped fastqs:
rule trim_galore_zipped_PE:
input:
r1=join(config['fq_in_path'], '{sample}1.fq.gz'),
r2=join(config['fq_in_path'], '{sample}2.fq.gz'),
output:
r1=join(config['trim_out_path'], '{sample}1_val_1.fq.gz'),
r2=join(config['trim_out_path'], '{sample}2_val_2.fq.gz'),
params:
out_path=config['trim_out_path'],
conda:
'envs/biotools.yaml',
shell:
'trim_galore --gzip -o {params.out_path} --paired {input.r1} {input.r2}'
答案 0 :(得分:2)
使用输入函数可能是最佳解决方案,如下所示:
备注:强>
<强> Snakefile:强>
configfile: "config.yaml"
from os.path import join
from os.path import exists
rule all:
input:
expand("{trim_out_path}/{sample}.{readDirection}.fq.gz",
trim_out_path=config["trim_out_path"],
sample=config["sampleList"],
readDirection=['1','2'])
def trim_galore_input_determination(wildcards):
potential_file_path_list = []
# Cycle through both suffix possibilities:
for fastqSuffix in [".fq", ".fq.gz"]:
# Cycle through both read directions
for readDirection in ['.1','.2']:
#Build the list for ech suffix
potential_file_path = config["fq_in_path"] + "/" + wildcards.sample + readDirection + fastqSuffix
#Check if this file actually exists
if exists(potential_file_path):
#If file is legit, add to list of acceptable files
potential_file_path_list.append(potential_file_path)
# Checking for an empty list
if len(potential_file_path_list):
return potential_file_path_list
else:
return ["trim_galore_input_determination_FAILURE" + wildcards.sample]
rule trim_galore_unzipped_PE:
input:
unpack(trim_galore_input_determination)
output:
expand("{trim_out_path}/{{sample}}.{readDirection}.fq.gz",
trim_out_path=config["trim_out_path"],
readDirection=['1','2'])
params:
out_path=config['trim_out_path'],
conda:
'envs/biotools.yaml',
shell:
'trim_galore --gzip -o {params.out_path} --paired {input}'
<强> config.yaml:强>
fq_in_path: input/fq
trim_out_path: output
sampleList: ["mySample1", "mySample2"]
<强> $树强>
|-- [tboyarsk 1540 Sep 6 15:17] Snakefile
|-- [tboyarsk 82 Sep 6 15:17] config.yaml
|-- [tboyarsk 512 Sep 6 8:55] input
| |-- [tboyarsk 512 Sep 6 8:33] fq
| | |-- [tboyarsk 0 Sep 6 7:50] mySample1.1.fq
| | |-- [tboyarsk 0 Sep 6 8:24] mySample1.2.fq
| | |-- [tboyarsk 0 Sep 6 7:50] mySample2.1.fq
| | `-- [tboyarsk 0 Sep 6 8:24] mySample2.2.fq
| `-- [tboyarsk 512 Sep 6 8:55] fqgz
| |-- [tboyarsk 0 Sep 6 7:50] mySample1.1.fq.gz
| |-- [tboyarsk 0 Sep 6 8:32] mySample1.2.fq.gz
| |-- [tboyarsk 0 Sep 6 8:33] mySample2.1.fq.gz
| `-- [tboyarsk 0 Sep 6 8:32] mySample2.2.fq.gz
`-- [tboyarsk 512 Sep 6 7:55] output
$ snakemake -dry(输入:fg)
rule trim_galore_unzipped_PE:
input: input/fq/mySample1.1.fq, input/fq/mySample1.2.fq
output: output/mySample1.1.fq.gz, output/mySample1.2.fq.gz
jobid: 1
wildcards: sample=mySample1
rule trim_galore_unzipped_PE:
input: input/fq/mySample2.1.fq, input/fq/mySample2.2.fq
output: output/mySample2.1.fq.gz, output/mySample2.2.fq.gz
jobid: 2
wildcards: sample=mySample2
localrule all:
input: output/mySample1.1.fq.gz, output/mySample2.1.fq.gz, output/mySample1.2.fq.gz, output/ mySample2.2.fq.gz
jobid: 0
Job counts:
count jobs
1 all
2 trim_galore_unzipped_PE
3
$ snakemake -dry(输入:fgqz)
rule trim_galore_unzipped_PE:
input: input/fqgz/mySample1.1.fq.gz, input/fqgz/mySample1.2.fq.gz
output: output/mySample1.1.fq.gz, output/mySample1.2.fq.gz
jobid: 1
wildcards: sample=mySample1
rule trim_galore_unzipped_PE:
input: input/fqgz/mySample2.1.fq.gz, input/fqgz/mySample2.2.fq.gz
output: output/mySample2.1.fq.gz, output/mySample2.2.fq.gz
jobid: 2
wildcards: sample=mySample2
localrule all:
input: output/mySample1.1.fq.gz, output/mySample1.2.fq.gz, output/mySample2.1.fq.gz, output/ mySample2.2.fq.gz
jobid: 0
Job counts:
count jobs
1 all
2 trim_galore_unzipped_PE
3
有一些方法可以使它更通用,但由于您声明并使用YAML配置来构建大部分文件名,我将避免在答案中讨论它。只是说它可能并且有点鼓励。
&#34; - 配对{input}&#34;将扩展以提供这两个文件。由于for循环,1将始终位于2之前。