源命令循环文件列表

时间:2015-04-06 13:09:19

标签: perl loops

我已经筋疲力尽我的大脑试图做一些对Perl编程经验比我更有经验的人。

我有以下代码。

use strict;
use warnings;

my @lines = do {
  open my $in_fh, '<', 'input.txt' or die qq{Unable to open "input.txt" for input: $!};
  <$in_fh>;
};
chomp @lines;
my $re = join '|', @lines;

my @files = grep /^(?:$re)/, glob '*.bam';
$_ = "INPUT=$_" for @files;


foreach my $file (@files) {
    foreach my $line (@lines) {
        if ($file =~ m/$line/) {
            my $command = "picard MergeSamFiles $file OUTPUT=$line" . "-tmp-herc2.bam MERGE_SEQUENCE_DICTIONARIES=TRUE CREATE_INDEX=TRUE";  
            system($command);
            my $command2 = "picard MarkDuplicates $line OUTPUT=$line-herc2.bam METRICS_FILE=tmp REMOVE_DUPLICATES=TRUE CREATE_INDEX=TRUE";
            system($command2);
                unlink "$line-tmp-herc2.bam";
                unlink "$line-tmp-herc2.bai";
                unlink "tmp";
        }
    }
}

在input.txt中,我有样本名称,用于验证样本是否在目录中。在这个例子中,我只使用了两个样本。

HG00096
HG00117

所以,通过上面的代码,我得到了类似的东西。

picard MergeSamFiles INPUT=HG00096.mapped.ILLUMINA.bwa.GBR.exome.20120522.bam_herc2_data.bam OUTPUT=HG00096-tmp-herc2.bam MERGE_SEQUENCE_DICTIONARIES=TRUE CREATE_INDEX=TRUE
picard MarkDuplicates HG00096 OUTPUT=HG00096-herc2.bam METRICS_FILE=tmp REMOVE_DUPLICATES=TRUE CREATE_INDEX=TRUE
picard MergeSamFiles INPUT=HG00096.mapped.ILLUMINA.bwa.GBR.low_coverage.20101123.bam_herc2_phase1.bam OUTPUT=HG00096-tmp-herc2.bam MERGE_SEQUENCE_DICTIONARIES=TRUE CREATE_INDEX=TRUE
picard MarkDuplicates HG00096 OUTPUT=HG00096-herc2.bam METRICS_FILE=tmp REMOVE_DUPLICATES=TRUE CREATE_INDEX=TRUE
picard MergeSamFiles INPUT=HG00096.mapped.ILLUMINA.bwa.GBR.low_coverage.20120522.bam_herc2_data.bam OUTPUT=HG00096-tmp-herc2.bam MERGE_SEQUENCE_DICTIONARIES=TRUE CREATE_INDEX=TRUE
picard MarkDuplicates HG00096 OUTPUT=HG00096-herc2.bam METRICS_FILE=tmp REMOVE_DUPLICATES=TRUE CREATE_INDEX=TRUE
picard MergeSamFiles INPUT=HG00096.mapped.illumina.mosaik.GBR.exome.20110411.bam_herc2_phase1.bam OUTPUT=HG00096-tmp-herc2.bam MERGE_SEQUENCE_DICTIONARIES=TRUE CREATE_INDEX=TRUE
picard MarkDuplicates HG00096 OUTPUT=HG00096-herc2.bam METRICS_FILE=tmp REMOVE_DUPLICATES=TRUE CREATE_INDEX=TRUE
picard MergeSamFiles INPUT=HG00117.mapped.ILLUMINA.bwa.GBR.exome.20120522.bam_herc2_data.bam OUTPUT=HG00117-tmp-herc2.bam MERGE_SEQUENCE_DICTIONARIES=TRUE CREATE_INDEX=TRUE
picard MarkDuplicates HG00117 OUTPUT=HG00117-herc2.bam METRICS_FILE=tmp REMOVE_DUPLICATES=TRUE CREATE_INDEX=TRUE
picard MergeSamFiles INPUT=HG00117.mapped.ILLUMINA.bwa.GBR.low_coverage.20101123.bam_herc2_phase1.bam OUTPUT=HG00117-tmp-herc2.bam MERGE_SEQUENCE_DICTIONARIES=TRUE CREATE_INDEX=TRUE
picard MarkDuplicates HG00117 OUTPUT=HG00117-herc2.bam METRICS_FILE=tmp REMOVE_DUPLICATES=TRUE CREATE_INDEX=TRUE
picard MergeSamFiles INPUT=HG00117.mapped.ILLUMINA.bwa.GBR.low_coverage.20120522.bam_herc2_data.bam OUTPUT=HG00117-tmp-herc2.bam MERGE_SEQUENCE_DICTIONARIES=TRUE CREATE_INDEX=TRUE
picard MarkDuplicates HG00117 OUTPUT=HG00117-herc2.bam METRICS_FILE=tmp REMOVE_DUPLICATES=TRUE CREATE_INDEX=TRUE
picard MergeSamFiles INPUT=HG00117.mapped.illumina.mosaik.GBR.exome.20110411.bam_herc2_phase1.bam OUTPUT=HG00117-tmp-herc2.bam MERGE_SEQUENCE_DICTIONARIES=TRUE CREATE_INDEX=TRUE
picard MarkDuplicates HG00117 OUTPUT=HG00117-herc2.bam METRICS_FILE=tmp REMOVE_DUPLICATES=TRUE CREATE_INDEX=TRUE

当我真的想要这样的东西时。

picard MergeSamFiles INPUT=HG00096.mapped.ILLUMINA.bwa.GBR.exome.20120522.bam_herc2_data.bam INPUT=HG00096.mapped.ILLUMINA.bwa.GBR.low_coverage.20101123.bam_herc2_phase1.bam INPUT=HG00096.mapped.ILLUMINA.bwa.GBR.low_coverage.20120522.bam_herc2_data.bam INPUT=HG00096.mapped.illumina.mosaik.GBR.exome.20110411.bam_herc2_phase1.bam   OUTPUT=HG00096-tmp-herc2.bam MERGE_SEQUENCE_DICTIONARIES=TRUE CREATE_INDEX=TRUE
picard MarkDuplicates HG00096-tmp-herc2.bam OUTPUT=HG00096-herc2.bam METRICS_FILE=tmp REMOVE_DUPLICATES=TRUE CREATE_INDEX=TRUE

picard MergeSamFiles INPUT=HG00117.mapped.ILLUMINA.bwa.GBR.exome.20120522.bam_herc2_data.bam INPUT=HG00117.mapped.ILLUMINA.bwa.GBR.low_coverage.20101123.bam_herc2_phase1.bam INPUT=HG00117.mapped.ILLUMINA.bwa.GBR.low_coverage.20120522.bam_herc2_data.bam INPUT=HG00117.mapped.illumina.mosaik.GBR.exome.20110411.bam_herc2_phase1.bam OUTPUT=HG00117-tmp-herc2.bam MERGE_SEQUENCE_DICTIONARIES=TRUE CREATE_INDEX=TRUE
picard MarkDuplicates HG00117-tmp-herc2.bam OUTPUT=HG00117-herc2.bam METRICS_FILE=tmp REMOVE_DUPLICATES=TRUE CREATE_INDEX=TRUE

因此,INPUT数据应该在一起,以便系统command合并文件,为下一个来源OUTPUT生成command2

我知道我正在弄乱foreach循环,但我试图弄清楚如何正确地迭代这个并且我卡住了。

希望你能帮我解决这个问题。

1 个答案:

答案 0 :(得分:0)

在第一个命令中,为OUTPUT文件添加后缀:

my $command = "picard MergeSamFiles $file OUTPUT=$line" . "-tmp-herc2.bam MERGE_SEQUENCE_DICTIONARIES=TRUE CREATE_INDEX=TRUE";  
#                                               here ___^_______________^

对第二个命令执行相同的操作:

my $command2 = "picard MarkDuplicates ${line}-tmp-herc2.bam OUTPUT=$line-herc2.bam METRICS_FILE=tmp REMOVE_DUPLICATES=TRUE CREATE_INDEX=TRUE";
#                                    here ___^____________^