如何检测和删除字符串中的重复句子?

时间:2015-01-24 11:34:50

标签: php string substring repeat

我在另一个问题上得到了你们所有人的帮助,我想知道我的下一期是否也能轻易解决。

基本上,由于我将PDF格式转换为excel文件,我在每个单元格中都有很多重复的句子。

例如:

$bad_string = "B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >";

$good_string = goodFunction($bad_String);
//echo 'B7R, B9R, B12R, B12M 430mm Disc 2005 >'

这到底怎么可能? 条件是坏字符串重复X次。它永远不会改变,就像复制和粘贴到位一样多(由于不良的pdf来转换)

有没有解决方案?

1 个答案:

答案 0 :(得分:2)

我使用preg_replace。我假设重复的字符串是连续的形式。

$bad_string = "B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >";
echo preg_replace('~^(.*?)\1+$~', '\1', $bad_string);

输出:

B7R, B9R, B12R, B12M 430mm Disc 2005 >

DEMO

如果句子必须以>符号结尾,那么您可以使用此正则表达式。

(.*?>)(?=(?:.*?\1)+$)

DEMO

$bad_string = "foo B7R, B9R, B12R, B12M 430mm Disc 2005 > bar B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >";
echo preg_replace('~(.*?>)(?=(?:.*?\1)+$)~', '', $bad_string);

输出:

foo  bar B7R, B9R, B12R, B12M 430mm Disc 2005 >