Python 3获取子元素(lxml)

时间:2018-10-12 02:36:23

标签: python html python-requests

我正在将lxml和html一起使用:

action=search info=completed

我如何检查元素的任何子级是否具有class =“ nearby” 我的代码(本质上):

from lxml import html
import requests

如何替换“ HasChildWithClass()”以使其真正起作用?

这是一个示例树:

url = "www.example.com"
Page = requests.get(url)
Tree = html.fromstring(Page.content)
resultList = Tree.xpath('//p[@class="result-info"]')
i=len(resultList)-1 #to go though the list backwards
while i>0:
    if (resultList[i].HasChildWithClass("nearby")):
        print('This result has a child with the class "nearby"')

2 个答案:

答案 0 :(得分:0)

这是我做的实验。

在python shell中输入r = resultList[0]并输入:

>>> dir(r)
['__bool__', '__class__', ..., 'find_class', ...

现在,此find_class方法非常可疑。如果您查看其帮助文档:

>>> help(r.find_class)

您将确认猜测。确实,

>>> r.find_class('nearby')
[<Element span at 0x109788ea8>]

对于您提供的示例xml代码中的其他标签s = resultList[1]

>>> s.find_class('nearby')
[]

现在很清楚如何分辨“附近”孩子是否存在。

干杯!

答案 1 :(得分:0)

我试图理解为什么您使用lxml查找元素。但是,BeautifulSoupre可能是更好的选择。

lxml = """
    <p class="result-info">
        <span class="result-meta">
            <span class="nearby">
                ... #this SHOULD print something
            </span>
        </span>
    </p>
    <p class="result-info">
        <span class="result-meta">
            <span class="FAR-AWAY">
                ... # this should NOT print anything
            </span>
        </span>
    </p>
    """

但是我做了你想要的。

from lxml import html

Tree = html.fromstring(lxml)
resultList = Tree.xpath('//p[@class="result-info"]')
i = len(resultList) - 1 #to go though the list backwards
for result in resultList:
    for e in result.iter():
        if e.attrib.get("class") == "nearby":
            print(e.text)

尝试使用bs4

from bs4 import BeautifulSoup


soup = BeautifulSoup(lxml,"lxml")
result = soup.find_all("span", class_="nearby")
print(result[0].text)