Question

我正在寻找与重复模式匹配的正则表达式。

例如

The great eagle flied high flied high.

重复：flied high

The call was done at night was done at night.

重复：was done at night

有没有办法实现这个目标？我只想要正则表达式，以便我可以使用grep -P来过滤一些文件。

Found 5 files under folders: home folder, home folder, home folder, home folder, home folder

重复：home folder

The query returned this preferences for this user: color black, fried chicken, color black, fried chicken, white shirt, brown color

重复：color black,

从本质上讲，我想做的是找到“重复句子”以匹配“。

Answer 1

您还没有很好地定义您的问题。目前你可以写

my $s = 'The great eagle flied high flied high.';
print qq{"$1"\n} if $s =~ /(.+)\1/;

<强>输出

" flied high"

但是，如果你应用第二个字符串

my $s = 'The call was done at night was done at night.';
print qq{"$1"\n} if $s =~ /(.+)\1/;

<强>输出

"l"

因此，解决方案取决于您拥有的数据集。如果您能更严格地定义问题，我们可以更好地帮助您。

Answer 2

是的，只需在正则表达式中使用\1来表示重复匹配的模式。我故意将此正则表达式限制为仅匹配2-4个单词短语以限制它必须工作的难度：

#!usr/bin/perl

use strict;
use warnings;

while (<DATA>) {
    if (my @phrases = /\b(\S+(?:\s+\S+){1,3})\s+\1/g) {
        print "$_\n" for @phrases;
    }
}

__DATA__
The great eagle flied high flied high.
The call was done at night was done at night.

<强>输出

flied high
was done at night

Perl正则表达式用于重复句子

2 个答案: