Question

使用php dom解析器解析不同的网站后，我得到了包含大量空行，意外回车，多个空格，制表符和其他惊喜的多行字符串：

输入

     Partner Company
 Firstname  Lastname   
                                        Street. 152 
            12345 City

Tel: 01234 567898
Fax: 01234 567899
Mobile: 0123 567899

现在，我一直在尝试使用preg_replace函数清理字符串......

代码

$lineToOutput = preg_replace("/(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+/", "\n", $lineToOutput);    // remove all blank (empty lines)
$lineToOutput = preg_replace("/[\t]/", " ", $lineToOutput); // convert tabs to spaces
$lineToOutput = preg_replace("/[ ]{2,}/", " ", $lineToOutput);  // convert multiple spaces to single spaces
$lineToOutput = preg_replace("/[\n] /", "\n", $lineToOutput);   // remove spaces at beginning of lines
$lineToOutput = preg_replace("/ [\n]/", "\n", $lineToOutput);   // remove spaces at end of lines

但未能删除以空格开头和结尾的行。有什么建议吗？

输出

 Partner Company    <-- unwanted space at beginning of line
Firstname Lastname  <-- unwanted space at end of line (not visible)
 Street. 152        <-- unwanted space at beginning of line
12345 City
Tel: 01234 567898
Fax: 01234 567899
Mobile: 0123 567899

Answer 1

使用多线模式，分别将^和$锚定到行的开头和结尾：

$lineToOutput = preg_replace("/^[ ]+|[ ]+$/m", "", $lineToOutput);

使用此功能，您还可以简化第一个表达式：

$lineToOutput = preg_replace("/^[\s\t]*[\r\n]+|[\r\n]+\Z/m", "\n", $lineToOutput);    // remove all blank (empty lines)

它并不短，但我认为在概念上更容易理解。第二种方法是捕捉一个尾随的空行。

另请注意，您无需撰写[\t]。 \t工作正常。

Answer 2

// Just the same solution like m.buettner, but a little simpler. 
$lineToOutput = preg_replace('/^\s*|\s*\Z/m', '', $lineToOutput);

REGEX清除多行空格，制表符，空行的多行字符串（PHP Preg_Replace）

2 个答案: