我在另一个问题上得到了你们所有人的帮助,我想知道我的下一期是否也能轻易解决。
基本上,由于我将PDF格式转换为excel文件,我在每个单元格中都有很多重复的句子。
例如:
$bad_string = "B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >";
$good_string = goodFunction($bad_String);
//echo 'B7R, B9R, B12R, B12M 430mm Disc 2005 >'
这到底怎么可能? 条件是坏字符串重复X次。它永远不会改变,就像复制和粘贴到位一样多(由于不良的pdf来转换)
有没有解决方案?
答案 0 :(得分:2)
我使用preg_replace
。我假设重复的字符串是连续的形式。
$bad_string = "B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >";
echo preg_replace('~^(.*?)\1+$~', '\1', $bad_string);
输出:
B7R, B9R, B12R, B12M 430mm Disc 2005 >
如果句子必须以>
符号结尾,那么您可以使用此正则表达式。
(.*?>)(?=(?:.*?\1)+$)
$bad_string = "foo B7R, B9R, B12R, B12M 430mm Disc 2005 > bar B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >";
echo preg_replace('~(.*?>)(?=(?:.*?\1)+$)~', '', $bad_string);
输出:
foo bar B7R, B9R, B12R, B12M 430mm Disc 2005 >