我目前正致力于这个项目,我正在努力解决这个问题。
我当前的目录结构是
/shared/dir1/file1.bam
/shared/dir2/file2.bam
/shared/dir3/file3.bam
我想在结果目录中将各种.bam文件转换为fastq
results/file1_1.fastq.gz
results/file1_2.fastq.gz
results/file2_1.fastq.gz
results/file2_2.fastq.gz
results/file3_1.fastq.gz
results/file3_2.fastq.gz
我有以下代码:
END=["1","2"]
(dirs, files) = glob_wildcards("/shared/{dir}/{file}.bam")
rule all:
input: expand( "/results/{sample}_{end}.fastq.gz",sample=files, end=END)
rule bam_to_fq:
input: {dir}/{sample}.bam"
output: left="/results/{sample}_1.fastq", right="/results/{sample}_2.fastq"
shell: "/shared/packages/bam2fastq/bam2fastq --force -o /results/{sample}.fastq {input}"
这会输出以下错误:
Wildcards in input files cannot be determined from output files:
'dir'
任何帮助将不胜感激
答案 0 :(得分:0)
你只是错过了" dir"在规则bam_to_fq的输入指令中。在您的代码中,您正试图让Snakemake确定" {dir}"从同一规则的输出中,因为您将其设置为通配符。由于它不存在,作为输出指令中的变量,您收到错误。
input:
"{dir}/{sample}.bam"
output:
left="/results/{sample}_1.fastq",
right="/results/{sample}_2.fastq",
经验法则:输入和输出通配符必须匹配
rule all:
input:
expand("/results/{sample}_{end}.fastq.gz", sample=files, end=END)
rule bam_to_fq:
input:
expand("{dir}/{{sample}}.bam", dir=dirs)
output:
left="/results/{sample}_1.fastq",
right="/results/{sample}_2.fastq"
shell:
"/shared/packages/bam2fastq/bam2fastq --force -o /results/{sample}.fastq {input}
备注强>
编辑1;添加到响应,因为我最初错过了dir和样本的键值匹配要求
我建议将路径和样本名称分开放在不同的变量中。我能想到的两种方法:
稍微概括一下bam_to_fq ...利用外部配置,比如....
来自pandas import read_table
rule all:
input:
expand("/results/{{sample[1][dir]}}/{sample[1][file]}_{end}.fastq.gz", sample=read_table(config["sampleFILE"], " ").iterrows(), end=['1','2'])
rule bam_to_fq:
input:
"{dir}/{sample}.bam"
output:
left="/results/{dir}/{sample}_1.fastq",
right="/results/{dir}/{sample}_2.fastq"
shell:
"/shared/packages/bam2fastq/bam2fastq --force -o /results/{sample}.fastq {input}
sampleFILE
dir file
dir1 file1
dir2 file2
dir3 file3