从html中删除\ r \ n和转义字符

时间:2014-04-19 14:18:47

标签: php html regex preg-replace imap

我有以下html,我使用imap_fetchbody,

从电子邮件中提取
<div dir=\"ltr\"><br><div class=\"gmail_quote\"><div dir=\"ltr\"><br><div class=\"gmail_quote\"><div class=\"\">
---------- Forwarded message ----------<br>
<span style=\"font-family:&quot;Helvetica&quot;,&quot;sans-serif&quot;\"><\/span>
From: <span style=\"font-family:&quot;Helvetica&quot;,&quot;sans-serif&quot;\">&quot;
<span>xyz<\/span>&quot; &lt;<a href=\"mailto:support@xyz.com\" target=\"_blank\">support@<span>xyz<\/span>.com<\/a>&gt;<\/span><br>
\r\n\r\n\r\n\r\nDate: Fri, Apr 18, 2014 at 7:17 PM<br>
Subject: Bla bla xyz<br><\/div><div><div class=\"h5\">To: XYZ &lt;<a href=\"mailto:xyz@gmail.com\" target=\"_blank\">xyz@gmail.com<\/a>&gt;<br><br><br>\r\n\r\n<div dir=\"ltr\">\r\n\r\n\r\n\r\n
<div class=\"gmail_quote\"><div><div><div dir=\"ltr\"><div class=\"gmail_quote\"><div dir=\"ltr\"><div><div class=\"gmail_quote\">
<div dir=\"ltr\"><div><div><div class=\"gmail_quote\"><div style=\"word-wrap:break-word\" lang=\"EN-US\">\r\n\r\n\r\n\r\n
<div>
<div>
<div>
<blockquote style=\"margin-top:5pt;margin-bottom:5pt\">
<div><div>
<table style=\"width:100%;background:none repeat scroll 0% 0% rgb(207,207,207)\" cellpadding=\"0\" cellspacing=\"0\" border=\"0\" width=\"100%\">
<tbody>
<tr>\r\n\r\n\r\n\r\n
<td style=\"width:325pt;padding:0in\" width=\"650\">\r\n\r\n<div align=\"center\"><table style=\"width:325pt;background:none repeat scroll 0% 0% rgb(207,207,207)\" cellpadding=\"0\" cellspacing=\"0\" border=\"0\" width=\"650\">\r\n\r\n\r\n\r\n
<tbody><tr>
<td style=\"padding:0in 0in 5.25pt\"><p style=\"text-align:center\" align=\"center\">
<span style=\"font-size:7.5pt;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;;color:rgb(64,64,64)\">If you are unable to see this message, 
<a href=\"http:\/\/click.e.xyz.com\/?qs=3771d7c90c958f02a4b2e78494f12a3116ddb15df79b8d04cdf5aeba42012b118\" target=\"_blank\">
<span style=\"color:rgb(64,64,64)\">click here<\/span><\/a> to view.<br>
\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\nTo ensure delivery to your inbox, please add <a href=\"mailto:support@xyz.com\" target=\"_blank\">support@xyz.com<\/a> to your address book. <\/span><\/p>
<\/td>
<\/tr>
<\/tbody>
<\/table>
<\/div><\/div><\/div><\/div>

我希望摆脱所有\\r\n并仍保留html的<>。 我试过striplashes,stripcslashes,nl2br,htmlspecialchars_decode。但我无法实现我想要的目标。 以下是我与imap_qprint函数

一起尝试的内容
$text = stripslashes(imap_qprint($text));
$body = preg_replace('/(\v|\s)+/', ' ', $text );

Res:它不会删除所有空格字符。

5 个答案:

答案 0 :(得分:1)

匹配以下正则表达式:

(\\r|\\n|\\)使用g修饰符

并替换为

''(空字符串)

演示:http://regex101.com/r/mS3wM2

答案 1 :(得分:1)

$html = preg_replace('/[\\\\\r\n]/', '', $html);

Match a single character present in the list below «[\\\r\n]»
   A \ character «\\»
   A carriage return character «\r»
   A line feed character «\n»

<强>更新

根据您的评论,我已更新了我的回答:

$html = preg_replace('%\\\\/%sm', '/', $html);
$html = preg_replace('/\\\\"/sm', '"', $html);
$html = preg_replace('/[\r\n]/sm', '', $html);

答案 2 :(得分:0)

如果字符串函数可以解决问题,总是支持正则表达式之上的字符串函数。与正则表达式相比,性能/速度会更好,并且它们在代码中更容易阅读:

$message = str_replace("\r\n", '', $message ); // replace all newlines, use double quotes!
$message = stripslashes( $message );

首先,您必须删除换行符。据我所知,\r\n总是在一起,所以我将它们替换为1。之后,striplashes将删除所有逃逸的斜线 你必须在换行符之后使用stripslashes,否则\r\n将导致rn,从而使其更难找到


这在我的测试中效果很好:

echo '<textarea style="width:100%; height: 33%;">'.$message.'</textarea>';
echo '<hr />';

$message = str_replace("\r\n", '', $message); // use double quotes!
echo '<textarea style="width:100%; height: 33%;">'.$message.'</textarea>';
echo '<hr />';

$message = stripslashes($message);
echo '<textarea style="width:100%; height: 33%;">'.$message.'</textarea>';

答案 3 :(得分:0)

您可以使用类似的东西来解释转义序列:

function interpret_escapes($str) {
    return preg_replace_callback('/\\\\(.)/u', function($matches) {
        $map = ['n' => "\n", 'r' => "\r", 't' => "\t", 'v' => "\v", 'e' => "\e", 'f' => "\f"];
        return isset($map[$matches[1]]) ? $map[$matches[1]] : $matches[1];
    }, $str);
}

答案 4 :(得分:0)

如果你可以在vi中打开文件,那就像以下一样简单:

%s/\\r\|\\n//g

on vi cmd mode