Question

我尝试使用Beautiful Soup来解析一些HTML，并且在使用以下代码时遇到了一些问题。

from selenium import webdriver
from selenium.webdriver.support import expected_conditions as EC 
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from html.parser import HTMLParser
from bs4 import BeautifulSoup

results = 20
driver = webdriver.Chrome()
driver.get('https://www.hydroshare.org/search')
element = WebDriverWait(driver,60).until(EC.presence_of_element_located((By.ID, 'items-discovered_wrapper')))

innerHTML = driver.execute_script("return document.body.innerHTML")
soup = BeautifulSoup(innerHTML,'html.parser')
table = soup.find('table',{'id':'items-discovered'})
print (type(table)) # Returns <class 'bs4.element.Tag'
children = table.findchildren() # TypeError: 'NoneType' Object is not callable

我不知道为什么打印输出类型（表）会返回一个Tag对象，但是当我尝试运行table.findchildren（）（或Tag对象允许的任何其他函数）时，这个表会以某种方式转换为null 。我也输入了

print(table)

会产生HTML字符串。

有谁知道为什么会这样或者如何解决这个问题？

Answer 1

没有findchildren这样的方法，因此尝试调用它是一个错误。

对于大多数Python代码，您会更清楚地AttributeError说findchildren不存在。但BS4节点有一个很好的快捷方式，你可以写node.spam而不是node.find('spam')。所以它正在调用find('findchildren')，返回None，然后您尝试调用None。

至于如何修复它...我不确定你在这里打算调用什么（通常是这样的错误，有人复制并粘贴了一些已经在3.0中弃用的BS3代码，并且在4中不存在.x，但这不是这种情况）。所以我可以建议你阅读文档以找到你真正需要的方法。有多种方法可以搜索，迭代等直接儿童和所有后代，我认为其中一种就是你想要的。

Answer 2

我不知道为什么打印输出类型（表）会返回一个Tag对象

因为table s type是BS4标记。

至于寻找孩子，你想要使用的功能实际上是.findChildren()。你刚刚错过了'C'的大写。所以你在最后一行真正需要的是：

children = table.findChildren()

编辑：正如@abarnert指出的那样，.findChildren()在技术上被beautifulsoup4描述，但该功能仍然存在。获得相同结果的新方法是使用.find_all()而不指定任何参数。这两个函数都起作用并返回相同的结果。所以最好使用children = table.find_all()

Answer 3

替换您的代码

 children = table.findchildren()

使用

children = table.findChildren()

它将正常工作。例如：;

findpsubchild=soup.find('p',class_="story")
all_p_subchild=findpsubchild.findChildren()
print(all_p_subchild)

由于NoneType，bs4无法调用findchildren

3 个答案: