Question

我需要删除所有不需要的<p>个。例如将<div><p>xxxx</p></div>转换为<div>xxxx</div>。

我如何用DOM做到这一点？ “如果<div>内只有一个<p>，请将<p>的文字分配给<div>并删除此<p>”。

我会用正则表达式来做，但有些人说它很糟糕。我无法想象如何使用DOM。

text = "<div><p>xxxx</p></div>"
???

是否可以用DOM解决？或者好的旧正则表达式更适合这种情况？
Python，而不是JavaScript。

Answer 1

这对我有用：

from xml.dom import minidom

text = "<div><p>xxxx</p></div>"
doc = minidom.parseString(text)

# For each div in the root document
for tag in doc.childNodes:
    # If it's a <p> and there's only one
    if len(tag.childNodes) == 1 and tag.childNodes[0].tagName == 'p':
        # p_node = <p>xxx</p>
        p_node = tag.childNodes[0]
        # p_text_node = xxx
        p_text_node = p_node.childNodes[0]
        value = p_node.nodeValue
        # Delete the <p>xxx</p>
        p_node.parentNode.removeChild(p_node)
        # Set the <div></div> -> <div>xxx</div>
        tag.appendChild(p_text_node)

print doc.toxml()

和产量：

<?xml version="1.0" ?><div>xxxx</div>

我希望你能接受我为你提出的其他问题所给出的答案，因为我为你完成了所有的工作;）

Answer 2

以下是使用BeautifulSoup：

执行此操作的方法

>>> import BeautifulSoup
>>> somehtml = '<html><title>hey</title><body><p>blah</p><div><p>something</p></div></body></html>'
>>> soup = BeautifulSoup.BeautifulSoup(somehtml)
>>> for p in soup.findAll('p'):
...    if p.parent.string is None and len(p.parent.contents) == 1:
...       p.parent.string = p.string
...       p.extract()
>>> soup
<html><title>hey</title><body><p>blah</p><div>something</div></body></html>

这将搜索所有<p>元素，这些元素的父级没有内容且只有一个子级（<p>元素），然后将<p>元素的内容复制到父级和删除<p>元素。

Answer 3

建立@jterrace答案：

（请编辑此问题，以便完整或评论）

我认为现在的方法是创建一个minidom.Document，以便您可以修改其xml节点。

#coding: utf-8

from xml.dom import minidom

text = "<div><p>xxxx</p></div>"

dom = minidom.parseString(text)

for p in dom.getElementsByTagName('p'):
    print p.childNodes
    # and what now?

Answer 4

如果你有jquery，这将有效。

$('div').each(function() {

    if ($(this).children().length > 1)
        return

    if ($(this).children()[0].tagName != "P")
        return

    this.innerHTML = $(this).children()[0].innerHTML;
});

Python中的DOM操作（如果某个元素只包含一个其他元素......）

4 个答案: