目前打印的所有名词都在右下方。
#!/usr/bin/perl
use strict;
use warnings FATAL => "all";
my $search_key = "expend"; ## CHANGE "..." to <>
open(my $tag_corpus, '<', "ch13tagged.txt") or die $!;
my @sentences = <$tag_corpus>; # This breaks up each line into list
my @words;
my %seens = ();
my %seenw = ();
for (my $i = 0; $i <= @sentences; $i++) {
if (defined($sentences[$i]) and $sentences[$i] =~ /($search_key)_VB.*/i) {
@words = split /\s/, $sentences[$i]; ## \s is a whitespace
for (my $j = 0; $j <= @words; $j++) {
#FILTER if word is noun, and therefore will end with _NN:
if (defined($words[$j]) and $words[$j] =~ /_NN/) {
#PRINT word (without _NN) and sentence (without any _ENDING):
next if $seenw{$words[$j]}++; ## How to include plural etc
push @words, $words[$j];
print "**", split(/_\S+/, $words[$j]), "**", "\n";
## next if $seens{ $sentences[$i] }++;
## push @sentences, $sentences[$i];
print split(/_\S+/, $sentences[$i]), "\n"
## HOW PRINT bold or specifically word bold?
#FILTER if word has been output, add sentence under that heading
}
} ## put print sentences here to print each sentence after all the nouns inside
}
}
close $tag_corpus || die "Can't close $tag_corpus: $!";
答案 0 :(得分:1)
你原来的:
#!/usr/bin/perl
use strict;
use warnings FATAL => "all";
这是一个好的开始......
my $search_key = "expend"; ## CHANGE "..." to <>
因为你要在循环中的正则表达式中使用它,所以最好编译它
正则表达式:my $verb_regex = qr/\bexpend_VB\b/i
。我把字边界放进去了
在那里,因为看起来你需要它们。 “
open(my $tag_corpus, '<', "ch13tagged.txt") or die $!;
my @sentences = <$tag_corpus>; # This breaks up each line into list
my @words;
my %seens = ();
my %seenw = ();
for (my $i = 0; $i <= @sentences; $i++) {
这与较少的开销:
大致相同while ( <$tag_corpus> ) {
...
回到你的身边:
if (defined($sentences[$i]) and $sentences[$i] =~ /($search_key)_VB.*/i) {
如果该行包含记录分隔符 - 除非你chomp
,否则它将永远存在
获取定义的行直到文件结尾。没有必要测试已定义的。
此外,您在搜索字词后不需要.*
并捕获$search_key
这没有效果。
@words = split /\s/, $sentences[$i]; ## \s is a whitespace
您不希望在单个空间上拆分空格。你应该使用/\s+/
,但是
更好的是:@words = split ' ', $sentences[$i];
但你甚至不需要那样。
for (my $j = 0; $j <= @words; $j++) {
#FILTER if word is noun, and therefore will end with _NN:
if (defined($words[$j]) and $words[$j] =~ /_NN/) {
#PRINT word (without _NN) and sentence (without any _ENDING):
但是,这就是你要做的事情:_NN
中的单词结束。另外,整体而言
将定义split
的列表 - 无需测试。
next if $seenw{$words[$j]}++; ## How to include plural etc
除非您想在每个句子后重置%seenw
,否则您只会处理每个_NN
单词一次每个文件。
push @words, $words[$j];
通过附加名词,我看不出这个push
如何为可能的目的服务
回到单词列表上。当然,在保存之前你已经进行了唯一性检查
如果有任何_NN
个词,你就会从无限循环开始,但这只意味着你会拥有
句子中的所有单词,后面跟着所有的“名词”。不仅如此,你只是简单
去测试它是一个名词并且不做任何事情。更不用说你了
clobber 下一句话的列表。
print "**", split(/_\S+/, $words[$j]), "**", "\n";
## next if $seens{ $sentences[$i] }++;
您不希望在循环词
中执行此操作 ## push @sentences, $sentences[$i];
同样,如果没有注释,我不认为你会想要这样做 在循环之外。似乎从2行之前的所有东西都是 在循环之后。
print split(/_\S+/, $sentences[$i]), "\n"
## HOW PRINT bold or specifically word bold?
#FILTER if word has been output, add sentence under that heading
}
} ## put print sentences here to print each sentence after all the nouns inside
}
}
close $tag_corpus || die "Can't close $tag_corpus: $!";
不。那将无法处理收盘时的不良回报。 ||
或者也是“绑定”
紧紧。您正在关闭$tag_corpus
或骰子的输出。幸运的是(或许是不幸的)
永远不会被称为死亡,因为如果我们到目前为止,$tag_corpus
应该是一个
真实的价值。
这是你正在尝试做的一种清理版本 - 用 我能理解的部分。
my @sentences;
# We're processing a single line at a time.
while ( <$tag_corpus> ) {
# Test if we want to work with the line
next unless m/$verb_regex/;
# If we do, then test that we haven't dealt with it before
# Although I suspect that this may not be needed as much if we're not
# pushing to a queue that we're reading from.
next if $seens{ $_ }++;
# split -> split ' ', $_
# pass through only those words that match _NN at the end and
# are unique so far. We test on a substitution, because the result
# still uniquely identifies a noun
foreach my $noun ( grep { s/_NN$// && !$seenw{ $_ }++ } split ) {
print "**$noun**\n";
}
# This will omit any adjacent punctuation you have after the word--if
# that's a problem.
print split( /_\S+/ ), "\n";
# Here we save the sentence.
push @sentences, $_;
}
close $tag_corpus or die "Can't close ch13tagged.txt: $!";