搜索模式并将占位符设置为变量

时间:2013-11-28 10:06:21

标签: regex perl

我想根据文件向人们分组。 该文件如下所示:

group1 = john dave jim collin; 
group2 = abc def ghi jkl mno
      pqr stu vxz; 
group3 = marc;

所以我必须匹配等号和分号之间的人 (最后的换行符,见第2组)并归属于一个小组。

我未能成功尝试以下内容:

my $person2ascr = "sarah";

open (grp_file, "<$group_file");
   # the line bellow will only match if the group list is in one line only
   while(<grp_file>) {my $grp = $1 if (/(.*)\s*=\s*.*\n*.*$person2ascr.*\n*.*;/i)};

   # the following line wont match any. Off course i close/open the file again
   while(<grp_file>) {my $grp = $1 if /(\w+)\s*=\s*(\w+)*\s*$person2ascr(\s+\w+)*\s*;/i};

但是当我阅读手册时,我得出结论,我做得对: - / 有什么帮助吗?

3 个答案:

答案 0 :(得分:0)

怎么样:

$/=";";
my @grps = <DATA>;
s/\n+//g for@grps;
my $person2ascr = "ghi";
for(@grps) {
    say "group: $1" if /^([^=]+)=.*\b$person2ascr\b/;
}

__DATA__
group1 = john dave jim collin; 
group2 = abc def ghi jkl mno
      pqr stu vxz; 
group3 = marc;

<强>输出:

group:  group2 

答案 1 :(得分:0)

当文件具有明确定义的记录结束标记时,有一种非常简单的方法可以从文件中一次读取记录。

#Enclosing braces to ensure local $/ stays very local
{
    #Use 3-arg open (safer)
    open my $fh, '<', $group_file or die "Can't open $group_file: $!";
    #Set "newline" separator to the end-of-record token
    local $/ = ";\n";
    while(my $record = <$fh>) {
        #$record will contain "groupN = some name or other;\n"
        chomp $record;
        #$record now contains "groupN = some name or other" without the trailing ";\n"
        my ($group, $data) = split / = /, $record, 2;
        #$group contains "groupN"; $data contains "some name or other"
        $grp = $group if $data =~ /$person2ascr/; #Add i modifier if you want case insensitive matching
    }
    #It's paranoid, but close _can_ fail
    close $fh or warn "Closing $group_file failed: $!";
}

答案 2 :(得分:0)

这个解决方案可能有点过头了。它解析组文件并构建完整的数据结构。但是,如果您反复查询组信息,这可能是合适的。如果你只需要grep针对几个名字的组文件,你可能不需要这个解决方案,因为这样做太过分了。

我为groups文件编写了一个通用解析器,它返回两个映射:从名称到组的映射,以及从组到名称的映射。

sub parse_name_groups
{
    my $file  = shift;          # file name of group file
    my %group_to_names;         # Hash mapping groups to lists of names
    my %name_to_groups;         # Hash mapping names to a list of groups
    my $group = "<UNKNOWN>";    # If we see a name outside of a group, assign it to <UNKNOWN>
    my $last_line_in_group = 0; # Flag: If we see a semicolon, this is the last line in a group.

    open my $fh, "<", $file
        or die "Cannot open group file \'$file\'\n";

    foreach my $line (<$fh>)
    {
        chomp $line;

        # Trim white space from front and back
        $line =~ s/^\s*//g;
        $line =~ s/\s*$//g;

        # Does line begin with a group specifier (ie. "group = ")?
        # If so, grab it and make it our current group.
        if ($line =~ s/^\s*(\S+)\s*=\s*//)
        {
            $group = $1;
        }

        # Does line have a semicolon?  Ignore it and everything
        # after.  Also, reset $group to <UNKNOWN> after this line.
        if ($line =~ s/;.*$//)
        {
            $last_line_in_group = 1;
        }

        # Split the rest of the line into a list of names
        # and make the name-to-group and group-to-name 
        # association.
        foreach my $name (split /\s+/, $line)
        {
            push @{ $group_to_names{ $group } }, $name;
            push @{ $name_to_groups{ $name  } }, $group;
        }

        if ($last_line_in_group)
        {
            $group = "<UNKNOWN>";
        }
        $last_line_in_group = 0;
    }

    close $fh;

    return ( \%group_to_names, \%name_to_groups );
}

这是一个示例程序,它将在组文件中查找名称,并告诉您该名称属于哪个组(如果有):

# Example program that looks up the group(s) associated with a name.  
# Usage:
# 
#   ./lookup_name group_file name

if ($#ARGV != 1)
{
    die "Usage: lookup_name group_file name\n";
}

my ( $file, $name ) = @ARGV;

my ($group_to_names, $name_to_groups) = parse_name_groups( $file );

my $groups = $name_to_groups->{ $name };

if (!defined $groups)
{
    print "$name does not belong to any groups\n";
} else
{
    print join("\n", @$groups), "\n";
}

由于未完全指定组文件格式,我在解析器中进行了一些判断调用。具体来说,如果它在看到group =标识之前看到类似名称的内容,它会将这些名称分配给组<UNKNOWN>。同样,如果它看到一个分号,那么它之后看到的任何名称(从后面的行开始),但在group =被分配给组<UNKNOWN>之前。

该代码还将分号视为“行尾”指示。在同一行上分号后的任何内容都会被忽略。

上述代码中应该有足够的注释,以便您可以根据应用需要更改这些行为。