我正在寻找与重复模式匹配的正则表达式。
例如
The great eagle flied high flied high.
重复:flied high
The call was done at night was done at night.
重复:was done at night
有没有办法实现这个目标?我只想要正则表达式,以便我可以使用grep -P
来过滤一些文件。
Found 5 files under folders: home folder, home folder, home folder, home folder, home folder
重复:home folder
The query returned this preferences for this user: color black, fried chicken, color black, fried chicken, white shirt, brown color
重复:color black,
从本质上讲,我想做的是找到“重复句子”以匹配“。
答案 0 :(得分:1)
您还没有很好地定义您的问题。目前你可以写
my $s = 'The great eagle flied high flied high.';
print qq{"$1"\n} if $s =~ /(.+)\1/;
<强>输出强>
" flied high"
但是,如果你应用第二个字符串
my $s = 'The call was done at night was done at night.';
print qq{"$1"\n} if $s =~ /(.+)\1/;
<强>输出强>
"l"
因此,解决方案取决于您拥有的数据集。如果您能更严格地定义问题,我们可以更好地帮助您。
答案 1 :(得分:0)
是的,只需在正则表达式中使用\1
来表示重复匹配的模式。我故意将此正则表达式限制为仅匹配2-4个单词短语以限制它必须工作的难度:
#!usr/bin/perl
use strict;
use warnings;
while (<DATA>) {
if (my @phrases = /\b(\S+(?:\s+\S+){1,3})\s+\1/g) {
print "$_\n" for @phrases;
}
}
__DATA__
The great eagle flied high flied high.
The call was done at night was done at night.
<强>输出强>
flied high
was done at night