Question

可能重复：
  php: how can I remove attributes from an html tag?
  How do I iterate over the HTML attributes of a Beautiful Soup element?

我有一些HTML，如下所示：

<div class="foo">
  <p id="first">Hello, world!</p>
  <p id="second">Stack Overflow</p>
</div>

它需要回来：

<div>
  <p>Hello, world!</p>
  <p>Stack Overflow</p>
</div>

我更喜欢Python解决方案，因为我已经在它需要使用的程序中使用BeautifulSoup。但是，如果这是一个更好的解决方案，我对PHP开放。我不认为sed正则表达式就足够了，尤其是将来可能使用＆lt;文本中的符号（我不控制输入）。

Answer 1

这也适用于sed， ≤（[！一个-ZA-Z] +）[^＆GT;] +＆GT; 然后只需替换第一组，如，＆LT; \ 1 GT;

Answer 2

使用 Lxml 。

可以在Python中轻松实现

首先安装 Lxml 并尝试以下代码：

from lxml.html import tostring, fromstring

html = '''
<div class="foo">
  <p id="first">Hello, world!</p>
  <p id="second">Stack Overflow</p>
</div>'''

htmlElement = fromstring(html)
for element in htmlElement.cssselect(''):
    for key in element.keys():
        element.attrib.pop(key)

result = tostring(htmlElement)

print result

从HTML标记中删除属性

2 个答案: