Perl - 迭代潜在的正则表达式匹配列表,直到找到一个然后退出循环

时间:2014-09-05 11:53:10

标签: regex perl grep

在我的perl脚本中,我想在许多目录中的某些文件中寻找一些潜在的正则表达式匹配。

我有一个哈希

my %qc = ("QCNM Daily QC"      => "GUN", 
          "Intrinsic Flood QA" => "PUN");

将会大幅增长。在目录$STUDY_DIR中,我想查看所有图像头文件(image1.hdimage2.hd等)并查找文本中是否存在任何哈希键。图像头文件只是纯文本文件。例如,我想查询image1.hd以查看文本“QCNM Daily QC”或“Intrinsic Flood QA”是否存在“如果QCNM每日QC存在我想设置变量$ study_type =”GUN“,类似”如果内在洪水“ QA“匹配我想设置$ study_type =”PUN“。如果找不到匹配,我想继续下一个图像文件。

到目前为止,这是我的代码

#Loop through all images
for ( my $i = 1; $i <= $num_images; $i++ ) {
    # Check image is of type described in %qc
    # We are only interested in manipulating these files
    my $match = 0;     #matched qc key to image header
    my $study_type;    #key value for when hash key is found in image header (eg PUN)

    #reset the internal iterator so a prior each() doesn't affect the loop
    keys %qc;
    while ( my ( $k, $v ) = each %qc ) {
        my @match = grep {/$k/} glob("$STUDY_DIR/image${i}.hd");
        $match = 1 if match is found then break out of loop;
    }

    next if $match == 0;    #Not a QC image we are interested in skip to next image
}

我正在努力做的是遍历每个哈希键,看看该文本是否存在于image.hd中。如果确实存在,我想设置$ match = 1和$ study_type =%qc {key}并退出循环。如果它不存在,我想继续下一个潜在的匹配。散列键是互斥的。尽管文本文件中可能没有匹配的密钥,但只有一个密钥可以存在对文件。

解决方案必须是perl,因为我有一些额外的perl命令可以在匹配的文件上执行。

3 个答案:

答案 0 :(得分:2)

您需要实际加载文件内容以测试它们是否包含某些字符串。

我建议使用哈希键构建一个正则表达式来与之比较。

以下打印出每个文件中的第一个匹配值,然后移动到下一个文件。注意,我使用Sort::Key::Natural natsort按自然顺序处理文件,但这只是一种样式首选项。

use strict;
use warnings;
use autodie;

use Sort::Key::Natural qw(natsort);

my $STUDY_DIR = '...';

my %qc = (
    "QCNM Daily QC"      => "GUN",
    "Intrinsic Flood QA" => "PUN"
);
my $qc_re = '(?:' . join('|', map quotemeta, sort {length $b <=> length $a} keys %qc) . ')';

FILE:
for my $file ( natsort glob("$STUDY_DIR/image*.hd") ) {
    open my $fh, '<', $file;
    while (<$fh>) {
        if (/($qc_re)/) {
            print "$qc{$1} - $file\n";
            next FILE;
        }
    }
}

答案 1 :(得分:1)

以下解决方案虽然有点麻烦似乎有效。我确信一个合理的perl程序员可以将代码行数减半。

#Loop through all images
for ( my $i = 1; $i <= $num_images; $i++ ) {
    # Check image is of type described in %qc
    # We are only interested in moving these files to QC filestore
    my $match      = 0;     #matched qc key to image header
    my $study_type = "";    #key value for when hash key is found in image header (eg PUN)

    my $image_header = "$STUDY_DIR/image${i}.hd";

    #reset the internal iterator so a prior each() doesn't affect the loop
    keys %qc;
    while ( my ( $k, $v ) = each %qc ) {
        open my $FH, $image_header or die "Could not open $image_header: $!";
        my (@lines) = grep /$k/, <$FH>;
        #If we get a match update required fields
        $match = 1 and $study_type = $qc{$k} if ( $#lines > 0 );
        close $FH;
        last if $match = 1;
    }

    print "$match, $study_type\n";
    next if $match == 0;    #Not a QC image we are interested in skip to next image
}

答案 2 :(得分:0)

您可以使用List::MoreUtils中的any。它在第一场成功的比赛中退出。

@images = glob("$STUDY_DIR/image${i}.hd)";

    if (any { $_ =~ /$k/ } @images){
       $match = 1;
       last;
    }

    $study_type = $qc{$key} and last if $match == 1;