我想根据文件向人们分组。 该文件如下所示:
group1 = john dave jim collin;
group2 = abc def ghi jkl mno
pqr stu vxz;
group3 = marc;
所以我必须匹配等号和分号之间的人 (最后的换行符,见第2组)并归属于一个小组。
我未能成功尝试以下内容:
my $person2ascr = "sarah";
open (grp_file, "<$group_file");
# the line bellow will only match if the group list is in one line only
while(<grp_file>) {my $grp = $1 if (/(.*)\s*=\s*.*\n*.*$person2ascr.*\n*.*;/i)};
# the following line wont match any. Off course i close/open the file again
while(<grp_file>) {my $grp = $1 if /(\w+)\s*=\s*(\w+)*\s*$person2ascr(\s+\w+)*\s*;/i};
但是当我阅读手册时,我得出结论,我做得对: - / 有什么帮助吗?
答案 0 :(得分:0)
怎么样:
$/=";";
my @grps = <DATA>;
s/\n+//g for@grps;
my $person2ascr = "ghi";
for(@grps) {
say "group: $1" if /^([^=]+)=.*\b$person2ascr\b/;
}
__DATA__
group1 = john dave jim collin;
group2 = abc def ghi jkl mno
pqr stu vxz;
group3 = marc;
<强>输出:强>
group: group2
答案 1 :(得分:0)
当文件具有明确定义的记录结束标记时,有一种非常简单的方法可以从文件中一次读取记录。
#Enclosing braces to ensure local $/ stays very local
{
#Use 3-arg open (safer)
open my $fh, '<', $group_file or die "Can't open $group_file: $!";
#Set "newline" separator to the end-of-record token
local $/ = ";\n";
while(my $record = <$fh>) {
#$record will contain "groupN = some name or other;\n"
chomp $record;
#$record now contains "groupN = some name or other" without the trailing ";\n"
my ($group, $data) = split / = /, $record, 2;
#$group contains "groupN"; $data contains "some name or other"
$grp = $group if $data =~ /$person2ascr/; #Add i modifier if you want case insensitive matching
}
#It's paranoid, but close _can_ fail
close $fh or warn "Closing $group_file failed: $!";
}
答案 2 :(得分:0)
这个解决方案可能有点过头了。它解析组文件并构建完整的数据结构。但是,如果您反复查询组信息,这可能是合适的。如果你只需要grep
针对几个名字的组文件,你可能不需要这个解决方案,因为这样做太过分了。
我为groups文件编写了一个通用解析器,它返回两个映射:从名称到组的映射,以及从组到名称的映射。
sub parse_name_groups
{
my $file = shift; # file name of group file
my %group_to_names; # Hash mapping groups to lists of names
my %name_to_groups; # Hash mapping names to a list of groups
my $group = "<UNKNOWN>"; # If we see a name outside of a group, assign it to <UNKNOWN>
my $last_line_in_group = 0; # Flag: If we see a semicolon, this is the last line in a group.
open my $fh, "<", $file
or die "Cannot open group file \'$file\'\n";
foreach my $line (<$fh>)
{
chomp $line;
# Trim white space from front and back
$line =~ s/^\s*//g;
$line =~ s/\s*$//g;
# Does line begin with a group specifier (ie. "group = ")?
# If so, grab it and make it our current group.
if ($line =~ s/^\s*(\S+)\s*=\s*//)
{
$group = $1;
}
# Does line have a semicolon? Ignore it and everything
# after. Also, reset $group to <UNKNOWN> after this line.
if ($line =~ s/;.*$//)
{
$last_line_in_group = 1;
}
# Split the rest of the line into a list of names
# and make the name-to-group and group-to-name
# association.
foreach my $name (split /\s+/, $line)
{
push @{ $group_to_names{ $group } }, $name;
push @{ $name_to_groups{ $name } }, $group;
}
if ($last_line_in_group)
{
$group = "<UNKNOWN>";
}
$last_line_in_group = 0;
}
close $fh;
return ( \%group_to_names, \%name_to_groups );
}
这是一个示例程序,它将在组文件中查找名称,并告诉您该名称属于哪个组(如果有):
# Example program that looks up the group(s) associated with a name.
# Usage:
#
# ./lookup_name group_file name
if ($#ARGV != 1)
{
die "Usage: lookup_name group_file name\n";
}
my ( $file, $name ) = @ARGV;
my ($group_to_names, $name_to_groups) = parse_name_groups( $file );
my $groups = $name_to_groups->{ $name };
if (!defined $groups)
{
print "$name does not belong to any groups\n";
} else
{
print join("\n", @$groups), "\n";
}
由于未完全指定组文件格式,我在解析器中进行了一些判断调用。具体来说,如果它在看到group =
标识之前看到类似名称的内容,它会将这些名称分配给组<UNKNOWN>
。同样,如果它看到一个分号,那么它之后看到的任何名称(从后面的行开始),但在group =
被分配给组<UNKNOWN>
之前。
该代码还将分号视为“行尾”指示。在同一行上分号后的任何内容都会被忽略。
上述代码中应该有足够的注释,以便您可以根据应用需要更改这些行为。