Snakemake规则,可根据输入变量(Snakemake语法)写入新的文本文件

时间:2019-03-06 17:55:05

标签: shell snakemake

我有一个功能齐全的Snakemake工作流程,但我想添加一条规则,在该规则中,将输入变量作为新行写到新生成的输出文本文件中。为了简要总结,我在下面添加了相关代码:

OUTPUTDIR = config["outputDIR"] 
SAMPLEID = list(SAMPLE_TABLE.Sample_Name)
# Above 2 lines are functional in other parts of script.

rule all:
  input:
    manifest = OUTPUTDIR + "/manifest.txt"

rule write_manifest:
  input:
    sampleid = SAMPLEID,
    loc_r1 = expand("{base}/trimmed/{sample}_1.trimmed.fastq.gz", base = OUTPUTDIR, sample = SAMPLELIST),
    loc_r2 = expand("{base}/trimmed/{sample}_2.trimmed.fastq.gz", base = OUTPUTDIR, sample = SAMPLELIST)
  output:
    OUTPUTDIR + "/manifest.txt"
  shell:
    """
    echo "{input.sampleid},{input.loc_r1},forward" >> {output}
    echo "{input.sampleid},{input.loc_r2},reverse" >> {output}
    """

我的问题是Snakemake正在读取文件,我需要它来打印它检测到的文件路径或样本ID。 帮助语法吗?

所需的输出文件应如下所示:

depth1,$PWD/raw_seqs_dir/Test01_full_L001_R1_001.fastq.gz,forward
depth1,$PWD/raw_seqs_dir/Test01_full_L001_R2_001.fastq.gz,reverse
depth2,$PWD/raw_seqs_dir/Test02_full_L001_R1_001.fastq.gz,forward
depth2,$PWD/raw_seqs_dir/Test02_full_L001_R2_001.fastq.gz,reverse

尝试使用echo编写。

错误消息:

Building DAG of jobs...
MissingInputException in [write_manifest]:
Missing input files for rule write_manifest:
sample1
sample2
sample3

更新: 通过将sampleid添加到params中:

rule write_manifest:
  input:
    loc_r1 = expand("{base}/trimmed/{sample}_{suf}_1.trimmed.fastq.gz", base = SCRATCHDIR, sample = SAMPLE$
    loc_r2 = expand("{base}/trimmed/{sample}_{suf}_2.trimmed.fastq.gz", base = SCRATCHDIR, sample = SAMPLE$
  output:
    OUTPUTDIR + "/manifest.txt"
  params:
    sampleid = SAMPLEID
  shell:
    """
    echo "{params.sampleid},{input.loc_r1},forward" >> {output}
    echo "{params.sampleid},{input.loc_r2},reverse" >> {output}
    """

我的输出看起来像这样(不正确)

sample1 sample2 sample3,$PWD/tmp/dir/sample1.fastq $PWD/tmp/dir/sample2.fastq $PWD/tmp/dir/sample3.fastq,forward
sample1 sample2 sample3,$PWD/tmp/dir/sample1.fastq $PWD/tmp/dir/sample2.fastq $PWD/tmp/dir/sample3.fastq,reverse

这仍然不是我想要的,我需要它看起来像下面所需的输出。我可以编写它以便Snakemake在每个样本/输入/参数中循环吗? 所需的输出文件应如下所示:

depth1,$PWD/raw_seqs_dir/Test01_full_L001_R1_001.fastq.gz,forward
depth1,$PWD/raw_seqs_dir/Test01_full_L001_R2_001.fastq.gz,reverse
depth2,$PWD/raw_seqs_dir/Test02_full_L001_R1_001.fastq.gz,forward
depth2,$PWD/raw_seqs_dir/Test02_full_L001_R2_001.fastq.gz,reverse

1 个答案:

答案 0 :(得分:1)

您需要在参数中使用通配符sample而不是变量SAMPLEID。在执行时,这将使用特定于该规则的正确样本ID。

params:
    sample = '{sample}'
shell:
    """
    echo "{params.sample},{input.loc_r1},forward" >> {output}
    echo "{params.sample},{input.loc_r2},reverse" >> {output}
    """