我需要使用Python从父标记(无论子标记如何)中提取数据。 从下面的代码中,我需要获取“嗨,这是父标签”,而无需获取“嗨,这是子标签”。我该怎么办?
<html>
<div>
"Hi, this is parent tag"
<span> "Hi, this is child tag" </span>
</div>
</html>
答案 0 :(得分:0)
from bs4 import BeautifulSoup
txt = """
<html>
<div>
"Hi, this is parent tag"
<span> "Hi, this is child tag" </span>
</div>
</html>
"""
soup = BeautifulSoup(txt)
for node in soup.findAll('div'):
print(' '.join(node.findAll(text=True, recursive=False)))
输出:
“嗨,这是父标签”
答案 1 :(得分:0)
您可以使用lxml包xpath语法
txt = """
<html>
<div>
"Hi, this is parent tag"
<span> "Hi, this is child tag" </span>
</div>
</html>
"""
from lxml.html.soupparser import fromstring
tree = fromstring(txt)
print tree.xpath("//div/text()")
良好的来源提示 https://devhints.io/xpath