Question

我试图在for循环中使用beautifulsoup find_all函数来返回两个具有不同类的td元素之一。 td元素位于html div元素中。有多个div通过for循环迭代，每个div将包含两个具有不同类的td元素之一。

我的目标是从td元素中获取文本，但是我无法找到一种方法来实现它，因此两个td类都可以接受find_all函数。

我想使用一个find_all来抓取这些td元素中的任何一个，无论当前div元素中是哪一个。

示例html如下所示：

<div> 
<td class='class1'>
text to scrape
</td>
</div>

<div> 
<td class='class2'>
text to scrape
</td>
</div>

我的代码看起来像这样：

for propbox in soup.find_all('div')
    tester = propbox.find_all('td', {"class" : lambda A: A.contains("class1") or A.contains("class2")})

我收到错误：AttributeError：'NoneType'对象没有属性'contains'

所以我假设当一个td类不存在时，python仍然试图在它不喜欢的None类型上使用.contains（）。

有谁知道我能做到这一点的方法？任何帮助/示例都非常感谢。提前致谢

Answer 1

The function给出了每个class属性值（str);然后是整个类属性值（除非没有为元素返回先前的调用）。但None传递的是如果没有class属性，则传递参数。

所以你需要检查None。

或者对于你来说，简单的in就足够了：

for propbox in soup.find_all('div'):
    tester = propbox.find_all('td', {
        "class": lambda class_: class_ in ("class1", "class2")
    })
    # print(tester)

BTW，没有contains方法，但__contains__ method（in，会员资格测试运营商会使用它）：

>>> 'haystack'.contains('needle')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'str' object has no attribute 'contains'
>>> 'haystack'.__contains__('needle')
False
>>> 'needle' in 'haystack'
False

Answer 2

我想出了另一种方法来做到这一点，可能远没有那么强大，而是出现了什么。

Agenttester = propbox.find_all('td', class_="class2")
    if Agenttester == []:
        Agenttester = 'This is class1'
    else:
        Agenttester = 'this is class2'

这在我的情况下也可以正常工作，因为如果div中不存在class2则返回[]。但是，假法有正确的想法

美丽的汤find_all，包含多个类名

2 个答案: