如何使用python从Parent标签获取数据

时间:2019-01-28 11:50:04

标签: python html beautifulsoup

我需要使用Python从父标记(无论子标记如何)中提取数据。 从下面的代码中,我需要获取“嗨,这是父标签”,而无需获取“嗨,这是子标签”。我该怎么办?

<html>
    <div>
        "Hi, this is parent tag"
        <span> "Hi, this is child tag" </span>
    </div>
</html>

2 个答案:

答案 0 :(得分:0)

from bs4 import BeautifulSoup

txt = """
<html>
    <div>
        "Hi, this is parent tag"
        <span> "Hi, this is child tag" </span>
    </div>
</html>
"""

soup = BeautifulSoup(txt)

for node in soup.findAll('div'):
    print(' '.join(node.findAll(text=True, recursive=False)))

输出:

  

“嗨,这是父标签”

答案 1 :(得分:0)

您可以使用lxml包xpath语法

txt = """
<html>
    <div>
        "Hi, this is parent tag"
        <span> "Hi, this is child tag" </span>
    </div>
</html>
"""

from lxml.html.soupparser import fromstring
tree = fromstring(txt)
print tree.xpath("//div/text()")

良好的来源提示 https://devhints.io/xpath