假设file.txt
每行只有一个句子如下:
John Depp is a great guy. He is very inteligent. He can do anything. Come and meet John Depp.
Perl代码如下: -
open ( FILE, "file.txt" ) || die "can't open file!";
@lines = <FILE>;
close (FILE);
$string = "John Depp";
foreach $line (@lines) {
if ($line =~ $string) { print "$line"; }
}
输出将是第一和第四行。
我想让它适用于具有随机换行符的文件,而不是每行一个英文句子。我的意思是它也适用于以下方面: -
John Depp is a great guy. He is very intelligent. He can do anything. Come and meet John Depp.
输出应该是第一和第四句。
有什么想法吗?
答案 0 :(得分:2)
首先,请注意着名演员的名字是Johnny Depp。
其次,弄清楚什么是句子而不是什么是棘手的。我要作弊并使用Lingua::Sentence:
#!/usr/bin/perl
use strict; use warnings;
use Lingua::Sentence;
my $splitter = Lingua::Sentence->new('en');
while ( my $text = <DATA> ) {
for my $sentence ( split /\n/, $splitter->split($text) ) {
print $sentence, "\n" if $sentence =~ /John Depp/;
}
}
__DATA__
John Depp is a great guy.
He is very intelligent.
He can do anything.
Come and meet John Depp.
John Depp is a great guy. He is very intelligent. He can do anything. Come and meet John Depp.
输出:
John Depp is a great guy. Come and meet John Depp. John Depp is a great guy. Come and meet John Depp.
答案 1 :(得分:2)
更简单:如果您假设“句子”以点分隔,那么您可以将其用作字段分隔符:
$/ = '.';
while(<>) {
print if (/John Depp/i);
}
答案 2 :(得分:1)
假设您的内容包含在字符串中:
my $content = "John Depp is a great guy.
He is very intelligent.
He can do anything.
Come and meet John Depp.";
my @arr = $content =~ /.*John Depp.*/mg;
foreach my $a (@arr) {
print "$a\n";
}
结果:
约翰·德普是个好人 快来见约翰·德普。
如果您只想提取有趣的部分,可以修改正则表达式,例如:
my @arr = $content =~ /is (\w+? ?\w+ \w+)./mg;
结果:
一个好人
非常聪明
答案 3 :(得分:0)
单程
while(<>){
if (/John Depp/i){
@s = split /\s*\.\s*/;
foreach my $line (@s){
@f=split /\s*\.\s*/ , $line;
foreach my $found (@f){
if ($found =~/John Depp/i) {
print $found."\n";
}
}
}
}
}
输出
$ cat file
John Depp is a great guy.
He is very inteligent.
He can do anything.
Come and meet John Depp.
John Depp is a great guy. He is very inteligent. He can do anything. Come and meet John Depp.
$ perl perl.pl file
John Depp is a great guy
Come and meet John Depp
John Depp is a great guy
Come and meet John Depp
答案 4 :(得分:0)
如果不小心,默认变量可能被破坏。所以命名一切都是个好主意。
这应该让你开始:
#!/usr/bin/perl -w
use strict;
my $targetString = "John Depp";
while (my $line = <STDIN>) {
chomp($line);
my @elements = split("\\.", $line);
foreach my $element (@elements) {
if ($element =~ m/$targetString/is) {
print trim($element).".\n";
}
}
}
sub trim {
my $string = shift;
$string =~ s/^\s+//;
$string =~ s/\s+$//;
return $string;
}
用法:
$ depp.pl < file
John Depp is a great guy.
Come and meet John Depp.
John Depp is a great guy.
Come and meet John Depp.
答案 5 :(得分:0)
查看原始代码,而不是专门回答您的问题。除非必须,否则将整个文件读入内存通常是个坏主意。您可以逐行处理文件
open ( FILE, "file.txt" ) || die "can't open file!";
$string = "John Depp";
while (<FILE>) {
if (/$string/) { print }
}