我正在将lxml和html一起使用:
action=search info=completed
我如何检查元素的任何子级是否具有class =“ nearby” 我的代码(本质上):
from lxml import html
import requests
如何替换“ HasChildWithClass()”以使其真正起作用?
这是一个示例树:
url = "www.example.com"
Page = requests.get(url)
Tree = html.fromstring(Page.content)
resultList = Tree.xpath('//p[@class="result-info"]')
i=len(resultList)-1 #to go though the list backwards
while i>0:
if (resultList[i].HasChildWithClass("nearby")):
print('This result has a child with the class "nearby"')
答案 0 :(得分:0)
这是我做的实验。
在python shell中输入r = resultList[0]
并输入:
>>> dir(r)
['__bool__', '__class__', ..., 'find_class', ...
现在,此find_class
方法非常可疑。如果您查看其帮助文档:
>>> help(r.find_class)
您将确认猜测。确实,
>>> r.find_class('nearby')
[<Element span at 0x109788ea8>]
对于您提供的示例xml代码中的其他标签s = resultList[1]
,
>>> s.find_class('nearby')
[]
现在很清楚如何分辨“附近”孩子是否存在。
干杯!
答案 1 :(得分:0)
我试图理解为什么您使用lxml
查找元素。但是,BeautifulSoup
和re
可能是更好的选择。
lxml = """
<p class="result-info">
<span class="result-meta">
<span class="nearby">
... #this SHOULD print something
</span>
</span>
</p>
<p class="result-info">
<span class="result-meta">
<span class="FAR-AWAY">
... # this should NOT print anything
</span>
</span>
</p>
"""
但是我做了你想要的。
from lxml import html
Tree = html.fromstring(lxml)
resultList = Tree.xpath('//p[@class="result-info"]')
i = len(resultList) - 1 #to go though the list backwards
for result in resultList:
for e in result.iter():
if e.attrib.get("class") == "nearby":
print(e.text)
尝试使用bs4
from bs4 import BeautifulSoup
soup = BeautifulSoup(lxml,"lxml")
result = soup.find_all("span", class_="nearby")
print(result[0].text)