RegEx用于“修复”电子邮件标题,使它们成为一行

时间:2012-10-09 17:49:25

标签: php regex preg-replace

  

可能重复:
  How to do unfolding RFC 822
  Parsing e-mail-like headers (similar to RFC822)

我有一些类似于电子邮件数据的输入数据,因为长行被包装到下一行。例如:

robot-useragent: ABCdatos BotLink/1.0.2 (test links)
robot-language: basic
robot-description: This robot is used to verify availability of the ABCdatos
                   directory entries (http://www.abcdatos.com), checking
                   HTTP HEAD. Robot runs twice a week. Under HTTP 5xx
                   error responses or unable to connect, it repeats
                   verification some hours later, verifiying if that was a
                   temporary situation.

robot-description字段对于一行来说“太长”,并且被包裹到下一行。为了帮助解析这些数据,我想提出一个可以与preg_replace()一起使用的RegEx来替换以下条件:

  • 新行字符后跟空格
  • 替换换行字符后跟其他换行符

示例输出:

robot-description: This robot is used to verify availability of the ABCdatos directory entries (http://www.abcdatos.com), checking HTTP HEAD. Robot runs twice a week. Under HTTP 5xx error responses or unable to connect, it repeats verification some hours later, verifiying if that was a temporary situation.

我是RegEx的新手。我怎样才能构建这样的表达式?如果您选择回答,请在表达式中包含组件的简要说明。我真的很想学习如何做这些。

我从这开始:\n([^\S])*它已经接近了。 http://codepad.org/iMObpgFX

2 个答案:

答案 0 :(得分:1)

也许你可以试试:

(\r|\n)\s+

(\r|\n) # matches both newline and carriage return 
\s+     # any whitespace (tabs, spaces, new lines)

Try it

答案 1 :(得分:0)

事实证明这个问题是重复的,但与Marc提到的问题不同。

答案:

$output = preg_replace('/\r\n(?:[ \t]+)/', '', $input);

从这里开始:https://stackoverflow.com/a/4227885/362536

我投票决定关闭这个问题,因为我无法删除它,因为它有答案。我将举行国会注意。