从Excel中删除文本中html标记之间的'空格'

时间:2018-10-13 08:40:13

标签: python python-3.x

在MsExcel / LibreOfficeCalc中,我有这样的文字:

<h3><strong>Ways to stretch your budget</strong>

<p>passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.
Why do we use it?
</p>

<p>passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.
Why do we use it?</p>

<ul>
    <li><strong>Instrument Rentals</strong> &nbsp;passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.
Why do we use it?</li>
    <li><strong>passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.
Why do we use it?</li>

</ul>

如何删除html标签之间的文本?

示例:

<p>content<p><ul><li>content></li></ul>

1 个答案:

答案 0 :(得分:1)

只需使用正则表达式:

import re

result = re.sub('>\s*<', '><', text, 0, re.M)