我正在尝试使用perl编写正则表达式,但我需要一些帮助。 我想做的是以下,假设我将这些文本作为例子:
1- [NP some / NN text / NNP here / NNP]
我对/ NNP标记的单词感兴趣,所以我希望我的正则表达式搜索每一行,直到它找到: [NP然后是一个空格然后(可能会或可能不会找到)标有/ NN的单词,然后是一个或多个标有/ NNP的单词(并且会包含一些特殊字符)。
我想从每一行中提取用/ NNP标记的单词,结果将是:
1- text here
到目前为止我所做的是用所有例子中的/ NNP提取标记的单词while ($line =~ m/\s(\S*?)\/NNP/gs)
{
my $word = $1;
print $word." ";
}
print "\n";
任何想法请求?
答案 0 :(得分:2)
首先是高尔夫:
my @list = map { [ /(\S+)\/NNP/g ] } map { ( /\[NP ([^\]]+)]/g ) } <DATA>;
'[NP...]'
'*/NNP'
。再多一点,就像这样:
my @list;
while ( my $line = <DATA> ) {
foreach my $g ( $line =~ /\[NP ([^\]]+)]/g ) {
push @list, [ $g =~ /(\S+)\/NNP/g ];
}
}
转储看起来像这样:
@list: [
[
'Ebd',
'AlmEz',
'AbrAhym'
],
[
'hAnY',
'HjAb'
],
[
'xAld',
'ftH',
'Allh'
],
[
'ESAm',
'$rf'
],
[
'AlqAhrp'
]
]
(回应评论)如上所述,有两种方法可以打印出结构。更标准的方法是:
use Data::Dumper ();
say Data::Dumper->Dump( [ \@list ], [ '*list' ] );
第二个是我使用的:
use Smart::Comments;
### @list
见Smart::Comments
。 (这几乎在幕后做同样的事情。)
答案 1 :(得分:1)
也许:
#!/usr/bin/env perl
use strict;
use warnings;
while (<DATA>) {
while ( m{(\[NP.+?\])}g ) {
my $piece = $1;
1 while $piece =~ m{(\w+)/NNP}g and printf "%s ",$1;
print "\n";
}
}
__DATA__
1- [NP Almst$Ar/NN Ebd/NNP AlmEz/NNP AbrAhym/NNP] [NP Almhnds/NN hAnY/NNP HjAb/NNP]
2- [NP xAld/NNP ftH/NNP Allh/NNP] [NP ESAm/NNP $rf/NNP] [NP AlqAhrp/NNP]
然后你要求能够跳过只有一个标记词的行。为此,我可能会这样做:
#!/usr/bin/env perl
use strict;
use warnings;
my @line = ();
while (<DATA>) {
while ( m{(\[NP.+?\])}g ) {
my $piece = $1;
while ( $piece =~ m{(\w+)/NNP}g ) {
push @line, $1;
}
print "@line\n", @line = () if @line && @line > 1;
}
}
__DATA__
1- [NP Almst$Ar/NN Ebd/NNP AlmEz/NNP AbrAhym/NNP] [NP Almhnds/NN hAnY/NNP HjAb/NNP]
2- [NP xAld/NNP ftH/NNP Allh/NNP] [NP ESAm/NNP $rf/NNP] [NP AlqAhrp/NNP]
3- [Nothing of interest here]
答案 2 :(得分:1)
好的,已有很多好的答案。这是基于split
的解决方案。
use strict;
use warnings;
use v5.10; # for say(), not required
while (<DATA>) {
for (grep /^\[NP /, # ..and keep only the NP-blocks
split(/(\[NP [^]]*\])/, $_)) { # Split on NP-blocks
my @a = map { (split m(/), $_)[0] } # ...keep first part
grep m{/NNP\]?$}, # ...and keep only /NNP
split; # Split the NP-block on whitespace
say "@a";
}
}
__DATA__
[NP Almst$Ar/NN Ebd/NNP AlmEz/NNP AbrAhym/NNP] [NP Almhnds/NN hAnY/NNP HjAb/NNP]
[NP xAld/NNP ftH/NNP Allh/NNP] [NP ESAm/NNP $rf/NNP] [NP AlqAhrp/NNP]
答案 3 :(得分:0)
也许是这样的?
#!/usr/bin/perl -w
use strict;
my $text = <<'DAISY';
[NP Almst$Ar/NN Ebd/NNP AlmEz/NNP AbrAhym/NNP] [NP Almhnds/NN hAnY/NNP HjAb/NNP]
[NP xAld/NNP ftH/NNP Allh/NNP] [NP ESAm/NNP $rf/NNP] [NP AlqAhrp/NNP]
DAISY
for my $tag ($text =~ /(\[NP.+?\/NNP\])/gm) {
my @words = $tag =~ / (\w+)\/NNP/g;
print "@words\n";
}
答案 4 :(得分:0)
假设你知道一点perl,这应该指向正确的方向:
$str = '
1- [NP Almst$Ar/NN Ebd/NNP AlmEz/NNP AbrAhym/NNP] [NP Almhnds/NN hAnY/NNP HjAb/NNP]
2- [NP xAld/NNP ftH/NNP Allh/NNP] [NP ESAm/NNP $rf/NNP] [NP AlqAhrp/NNP]
';
while ($str =~ /\[NP([^\]]+)\]/g)
{
for ( $1 =~ /\s(\S*?)\/NNP/g) {
print "$_ ";
}
print "\n";
}