这是我的档案:
heaven
heavenly
heavenns
abc
heavenns
heavennly
根据我的代码,只有heavenns
和heavennly
应该被推送到@myarr
,并且它们应该只在阵列中一次。怎么做?
my $regx = "heavenn\+";
my $tmp=$regx;
$tmp=~ s/[\\]//g;
$regx=$tmp;
print("\nNow regex:", $regx);
my $file = "myfilename.txt";
my @myarr;
open my $fh, "<", $file;
while ( my $line = <$fh> ) {
if ($line =~ /$regx/){
print $line;
push (@myarr,$line);
}
}
print ("\nMylist:", @myarr); #printing 2 times heavenns and heavennly
答案 0 :(得分:1)
对于$_
中的给定值,!$seen{$_}++
仅在第一次执行时才为真。
my $regx = qr/heavenn/;
my @matches;
my %seen;
while (<>) {
chomp;
push(@mymatches, $_) if /$regx/ && !$seen{$_}++;
}
答案 1 :(得分:1)
这是Perl,因此不止一种方法(TMTOWTDI)。这是其中之一:
#!/usr/bin/env perl
use strict;
use warnings;
my $regex = "heavenn+";
my $rx = qr/$regex/;
print "Regex: $regex\n";
my $file = "myfilename.txt";
my %list;
my @myarr;
open my $fh, "<", $file or die "Failed to open $file: $?";
while ( my $line = <$fh> )
{
if ($line =~ $rx)
{
print $line;
$list{$line}++;
}
}
push @myarr, sort keys %list;
print "Mylist: @myarr\n";
示例输出:
Regex: heavenn+
heavenns
heavenns
heavennly
Mylist: heavennly
heavenns
排序不是必需的(但它以合理的顺序呈现数据)。当$list{$line}
中的计数为0时,您可以向数组添加项目。您可以选择输入行以删除换行符。等
如果我只想推特定的单词怎么办?例如,如果我的文件是,1。“天堂你好”2.“天堂喜”,“3。天真好”。怎么做只打印'天堂'和'天堂'?
然后你必须安排只捕捉这个词。这意味着改进正则表达式。假设你在单词的开头想要heavenn
并且不介意之后的字母字符,那么:
#!/usr/bin/env perl
use strict;
use warnings;
my $regex = '\b(heavenn[A-Za-z]*)\b'; # Single quotes necessary!
my $rx = qr/$regex/;
print "Regex: $regex\n";
my $file = "myfilename.txt";
my %list;
my @myarr;
open my $fh, "<", $file or die "Failed to open $file: $?";
while ( my $line = <$fh> )
{
if ($line =~ $rx)
{
print $line;
$list{$1}++;
}
}
push @myarr, sort keys %list;
print "Mylist: @myarr\n";
数据文件:
1. "heavenns hello"
2. "heavenns hi",
"3.heavennly good". What to d
heaven
heavenly
heavenns
abc
heavenns
heavennly
输出:
Regex: \b(heavenn[A-Za-z]*)\b
1. "heavenns hello"
2. "heavenns hi",
"3.heavennly good". What to d
heavenns
heavenns
heavennly
Mylist: heavennly heavenns
请注意,列表中的名称不再包含换行符。
此版本从命令行获取正则表达式。脚本调用是:
perl script.pl -p 'regex' [file ...]
如果在命令行上没有指定文件,它将从标准输入读取(比具有固定输入文件名更好 - 大幅度)。它会在每一行中查找指定正则表达式的多次出现,其中正则表达式可以在\w
指定的前面或后面跟着(或两个)“单词字符”。
#!/usr/bin/env perl
use strict;
use warnings;
use Getopt::Std;
my %opts;
getopts('p:', \%opts) or die "Usage: $0 [-p 'regex']\n";
my $regex_base = 'heavenn';
#$regex_base = $ARGV[0] if defined $ARGV[0];
$regex_base = $opts{p} if defined $opts{p};
my $regex = '\b(\w*' . ${regex_base} . '\w*)\b';
my $rx = qr/$regex/;
print "Regex: $regex (compiled form: $rx)\n";
my %list;
my @myarr;
while (my $line = <>)
{
while ($line =~ m/$rx/g)
{
print $line;
$list{$1}++;
#$line =~ s///;
}
}
push @myarr, sort keys %list;
print "Matched words: @myarr\n";
给定输入文件:
1. "heavenns hello"
2. "heavenns hi",
"3.heavennly good". What to d
Good heavennsy! What a heavennnly output from an equally heavennnnly input!
An unheavenly host. Good heavens! It heaves to like a yacht!
heaven
Is it heavens
heavenly
heavenns
abc
heavenns
heavennly
您可以获得以下输出:
$ perl script.pl -p 'e\w*?ly' myfilename.txt
Regex: \b(\w*e\w*?ly\w*)\b (compiled form: (?^:\b(\w*e\w*?ly\w*)\b))
"3.heavennly good". What to d
Good heavennsy! What a heavennnly output from an equally heavennnnly input!
Good heavennsy! What a heavennnly output from an equally heavennnnly input!
Good heavennsy! What a heavennnly output from an equally heavennnnly input!
An unheavenly host. Good heavens! It heaves to like a yacht!
heavenly
heavennly
Matched words: equally heavenly heavennly heavennnly heavennnnly unheavenly
$ perl script.pl myfilename.txt
Regex: \b(\w*heavenn\w*)\b (compiled form: (?^:\b(\w*heavenn\w*)\b))
1. "heavenns hello"
2. "heavenns hi",
"3.heavennly good". What to d
Good heavennsy! What a heavennnly output from an equally heavennnnly input!
Good heavennsy! What a heavennnly output from an equally heavennnnly input!
Good heavennsy! What a heavennnly output from an equally heavennnnly input!
heavenns
heavenns
heavennly
Matched words: heavennly heavennnly heavennnnly heavenns heavennsy
$
答案 2 :(得分:0)
如果您只想推送单词的第一次出现,可以在正则表达式之后在循环中添加以下内容:
# Assumes "my %seen;" is declared outside the loop.
next if $seen{$line}++;