我想清理一些有点脏的邮件和HTML正文(取自Gmail发送的电子邮件):有很多嵌套<div>
,不需要的字体更改等。
我想要清除此内容并仅保留<a>
,<b>
,<br>
,<i>
,<img>
,以及其他(也许还有<p>
或少数<div>
当且仅当它真的有必要时)。
使用regex /<\/?(?!(a|br|b|img)\b)\w+[^>]*>/g
,大部分时间都有效:
document.onclick = function() {
document.body.innerHTML = document.body.innerHTML.replace(/<\/?(?!(a|br|b|img)\b)\w+[^>]*>/g, '');
}
&#13;
<div dir="ltr"><div class="gmail_quote"><div dir="ltr">Hello,<div><br></div><div><div><div style="font-size:12.8px"><span style="font-size:12.8px">Thank you for your message.</span><br></div><div style="font-size:12.8px"><br></div><div style="font-size:12.8px"><span style="font-size:12.8px">If the L<span class="m_-527331299899979m_70391001927gmail-il">orem</span>i</span><span class="m_-527331299899979m_703910001927gmail-m_2466414472930393055gmail-il" style="font-size:12.8px">psum</span><span style="font-size:12.8px"> bla bla </span><a href="http://example.com" style="font-size:12.8px" target="_blank">test</a><span style="font-size:12.8px"> window, then it will be like this.</span><br></div><div style="font-size:12.8px">Blah blah.</div><div style="font-size:12.8px"><br></div><div style="font-size:12.8px">Lorem ipsum<span style="font-size:12.8px">lorem ipsum </span><span style="font-size:12.8px">blah blah and</span><span style="font-size:12.8px"> you can </span><span style="font-size:12.8px">also <i>blah blah</i> and finally <i>Blah</i>.</span></div><div style="font-size:12.8px"><span style="font-size:12.8px"><br></span></div><div style="font-size:12.8px"><span style="font-size:12.8px">-----------</span></div><div style="font-size:12.8px"><span style="font-size:12.8px"><br></span></div><div style="font-size:12.8px"><span style="font-size:12.8px">Examples:</span></div><div style="font-size:12.8px"><span style="font-size:12.8px"><br></span></div><div><div><span style="font-size:12.8px">example: <a href="http://example.com" target="_blank">test1</a></span></div><div><span style="font-size:12.8px">example: <a href="http://example.com" target="_blank">test2</a></span></div><div><br></div><div><div><span style="font-size:12.8px">example: <a href="http://example.com" target="_blank">test3</a></span></div><div><span style="font-size:12.8px">example: <a href="http://example.com" target="_blank">test4</a></span></div></div><div><br></div><div><span style="font-size:12.8px">example: <a href="http://example.com" target="_blank">test4</a></span></div><div><span style="font-size:12.8px">example: <a href="http://example.com" target="_blank">test5</a></span></div><div><span style="font-size:12.8px"><br></span></div><div><span style="font-size:12.8px">example: <a href="http://example.com" target="_blank">example</a></span></div><div><span style="font-size:12.8px">example: <a href="http://example.com" target="_blank">ex<wbr>ample</a></span></div><div><span style="font-size:12.8px"><br></span></div><div><span style="font-size:12.8px">example: <a href="http://example.com" target="_blank">example</a></span></div><div><span style="font-size:12.8px">example: <a href="http://example.com" target="_blank">exam<wbr>ple</a></span></div><div><span style="font-size:12.8px"><br></span></div><div><br></div></div></div><div class="gmail_extra" style="font-size:12.8px"><div class="m_-52733129979m_703911927gmail-m_24664144055gmail_signature"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><span style="font-size:small">Sincerly,</span><br></div></div></div></div></div></div></div></div><div><div><div class="m_-52722719979m_7039100982345401927gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><br></div><div>Myself<br></div><div dir="ltr"><br><b>example</b><br>web: <a href="http://www.example.com" target="_blank">www.example.com</a><br></div><div>fb: <a href="http://www.facebook.com/example/" target="_blank">www.facebook.com/LoremIp<wbr>sum/</a><br></div><div>mail: <a href="mailto:contact@example.com" target="_blank">contact@example.com</a><br></div><div dir="ltr"><br><img src="http://example.com/example.png"><br></div></div></div></div></div></div></div></div></div></div></div></div></div></div><br></div>
&#13;
(在运行代码段后点击电子邮件中的任意位置,以便在应用正则表达式后发生了什么)
事实上:
<span>
或</span>
已成功删除<div fontstyle="...">
和</div>
已删除但是在删除<div>
这样的问题时仍然存在问题:
删除空行(请参阅邮件输出第1行和第3行之间的空行,第3行和第5行之间等)
每个example: test1
后删除换行符(请参阅运行代码段时)
我尝试将<div.*?><br></div>
替换为<br><br>
,但它仍然不正确。
问题:如何清理此HTML代码,丢弃不需要的字体更改等,并保持相同的空行,并保持<a>
,<b>
,<br>
,<i>
,<img>
代码?
注意:它必须最终在Google Apps脚本中运行,因此我不确定是否可以导入第三方JS库...
答案 0 :(得分:1)
以下5个步骤适用于您提供的样本:
<div><br></div>
替换为<br><br>
</div>
替换1个或多个结束<br>
代码的任何序列,可能在<br>
之后。<br>
标记替换2个或更多<br>
个rags的任意序列。 代码:
document.onclick = function() {
document.body.innerHTML = document.body.innerHTML
.replace(/<\/?(?!(a|br|b|i|img|div)\b)\w+[^>]*>/g, '')
.replace(/<div[^>]*><br><\/div>/g, '<br><br>')
.replace(/((<br>)?<\/div>)+/g, '<br>')
.replace(/<div[^>]*>/g, '')
.replace(/(<br>){2,}/g, '<br><br>');
}
&#13;
<div dir="ltr"><div class="gmail_quote"><div dir="ltr">Hello,<div><br></div><div><div><div style="font-size:12.8px"><span style="font-size:12.8px">Thank you for your message.</span><br></div><div style="font-size:12.8px"><br></div><div style="font-size:12.8px"><span style="font-size:12.8px">If the L<span class="m_-527331299899979m_70391001927gmail-il">orem</span>i</span><span class="m_-527331299899979m_703910001927gmail-m_2466414472930393055gmail-il" style="font-size:12.8px">psum</span><span style="font-size:12.8px"> bla bla </span><a href="http://example.com" style="font-size:12.8px" target="_blank">test</a><span style="font-size:12.8px"> window, then it will be like this.</span><br></div><div style="font-size:12.8px">Blah blah.</div><div style="font-size:12.8px"><br></div><div style="font-size:12.8px">Lorem ipsum<span style="font-size:12.8px">lorem ipsum </span><span style="font-size:12.8px">blah blah and</span><span style="font-size:12.8px"> you can </span><span style="font-size:12.8px">also <i>blah blah</i> and finally <i>Blah</i>.</span></div><div style="font-size:12.8px"><span style="font-size:12.8px"><br></span></div><div style="font-size:12.8px"><span style="font-size:12.8px">-----------</span></div><div style="font-size:12.8px"><span style="font-size:12.8px"><br></span></div><div style="font-size:12.8px"><span style="font-size:12.8px">Examples:</span></div><div style="font-size:12.8px"><span style="font-size:12.8px"><br></span></div><div><div><span style="font-size:12.8px">example: <a href="http://example.com" target="_blank">test1</a></span></div><div><span style="font-size:12.8px">example: <a href="http://example.com" target="_blank">test2</a></span></div><div><br></div><div><div><span style="font-size:12.8px">example: <a href="http://example.com" target="_blank">test3</a></span></div><div><span style="font-size:12.8px">example: <a href="http://example.com" target="_blank">test4</a></span></div></div><div><br></div><div><span style="font-size:12.8px">example: <a href="http://example.com" target="_blank">test4</a></span></div><div><span style="font-size:12.8px">example: <a href="http://example.com" target="_blank">test5</a></span></div><div><span style="font-size:12.8px"><br></span></div><div><span style="font-size:12.8px">example: <a href="http://example.com" target="_blank">example</a></span></div><div><span style="font-size:12.8px">example: <a href="http://example.com" target="_blank">ex<wbr>ample</a></span></div><div><span style="font-size:12.8px"><br></span></div><div><span style="font-size:12.8px">example: <a href="http://example.com" target="_blank">example</a></span></div><div><span style="font-size:12.8px">example: <a href="http://example.com" target="_blank">exam<wbr>ple</a></span></div><div><span style="font-size:12.8px"><br></span></div><div><br></div></div></div><div class="gmail_extra" style="font-size:12.8px"><div class="m_-52733129979m_703911927gmail-m_24664144055gmail_signature"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><span style="font-size:small">Sincerly,</span><br></div></div></div></div></div></div></div></div><div><div><div class="m_-52722719979m_7039100982345401927gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><br></div><div>Myself<br></div><div dir="ltr"><br><b>example</b><br>web: <a href="http://www.example.com" target="_blank">www.example.com</a><br></div><div>fb: <a href="http://www.facebook.com/example/" target="_blank">www.facebook.com/LoremIp<wbr>sum/</a><br></div><div>mail: <a href="mailto:contact@example.com" target="_blank">contact@example.com</a><br></div><div dir="ltr"><br><img src="http://example.com/example.png"><br></div></div></div></div></div></div></div></div></div></div></div></div></div></div><br></div>
&#13;
答案 1 :(得分:0)
这是我最终使用的(适用于通过Gmail发送的所有电子邮件),积分99,99%来自@ Michelle接受的答案:
document.onclick = function() {
document.body.innerHTML = document.body.innerHTML.replace(/<\/?(?!(a|br|b|i|img|div)\b)\w+[^>]*>/g, '')
.replace(/<div[^>]*><br[^>]*>/g, '<br><br>')
.replace(/((<br>)?<\/div>)+/g, '<br>')
.replace(/<div[^>]*>/g, '')
.replace(/(<br>){2,}/g, '<br><br>')
.replace(/ style="font-size.*?"/g, '');
}
<div dir="ltr"><div class="gmail_quote"><div dir="ltr">Hello,<div><br></div><div><div><div style="font-size:12.8px"><span style="font-size:12.8px">Thank you for your message.</span><br></div><div style="font-size:12.8px"><br></div><div style="font-size:12.8px"><span style="font-size:12.8px">If the L<span class="m_-527331299899979m_70391001927gmail-il">orem</span>i</span><span class="m_-527331299899979m_703910001927gmail-m_2466414472930393055gmail-il" style="font-size:12.8px">psum</span><span style="font-size:12.8px"> bla bla </span><a href="http://example.com" style="font-size:12.8px" target="_blank">test</a><span style="font-size:12.8px"> window, then it will be like this.</span><br></div><div style="font-size:12.8px">Blah blah.</div><div style="font-size:12.8px"><br></div><div style="font-size:12.8px">Lorem ipsum<span style="font-size:12.8px">lorem ipsum </span><span style="font-size:12.8px">blah blah and</span><span style="font-size:12.8px"> you can </span><span style="font-size:12.8px">also <i>blah blah</i> and finally <i>Blah</i>.</span></div><div style="font-size:12.8px"><span style="font-size:12.8px"><br></span></div><div style="font-size:12.8px"><span style="font-size:12.8px">-----------</span></div><div style="font-size:12.8px"><span style="font-size:12.8px"><br></span></div><div style="font-size:12.8px"><span style="font-size:12.8px">Examples:</span></div><div style="font-size:12.8px"><span style="font-size:12.8px"><br></span></div><div><div><span style="font-size:12.8px">example: <a href="http://example.com" target="_blank">test1</a></span></div><div><span style="font-size:12.8px">example: <a href="http://example.com" target="_blank">test2</a></span></div><div><br></div><div><div><span style="font-size:12.8px">example: <a href="http://example.com" target="_blank">test3</a></span></div><div><span style="font-size:12.8px">example: <a href="http://example.com" target="_blank">test4</a></span></div></div><div><br></div><div><span style="font-size:12.8px">example: <a href="http://example.com" target="_blank">test4</a></span></div><div><span style="font-size:12.8px">example: <a href="http://example.com" target="_blank">test5</a></span></div><div><span style="font-size:12.8px"><br></span></div><div><span style="font-size:12.8px">example: <a href="http://example.com" target="_blank">example</a></span></div><div><span style="font-size:12.8px">example: <a href="http://example.com" target="_blank">ex<wbr>ample</a></span></div><div><span style="font-size:12.8px"><br></span></div><div><span style="font-size:12.8px">example: <a href="http://example.com" target="_blank">example</a></span></div><div><span style="font-size:12.8px">example: <a href="http://example.com" target="_blank">exam<wbr>ple</a></span></div><div><span style="font-size:12.8px"><br></span></div><div><br></div></div></div><div class="gmail_extra" style="font-size:12.8px"><div class="m_-52733129979m_703911927gmail-m_24664144055gmail_signature"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><span style="font-size:small">Sincerly,</span><br></div></div></div></div></div></div></div></div><div><div><div class="m_-52722719979m_7039100982345401927gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><br></div><div>Myself<br></div><div dir="ltr"><br><b>example</b><br>web: <a href="http://www.example.com" target="_blank">www.example.com</a><br></div><div>fb: <a href="http://www.facebook.com/example/" target="_blank">www.facebook.com/LoremIp<wbr>sum/</a><br></div><div>mail: <a href="mailto:contact@example.com" target="_blank">contact@example.com</a><br></div><div dir="ltr"><br><img src="http://example.com/example.png"><br></div></div></div></div></div></div></div></div></div></div></div></div></div></div><br></div>