我收到错误AttributeError:'NoneType'对象每次运行下面的Python脚本时都没有属性'findAll'。我做了一些研究,并发现了一些帖子,说明我在尝试查找图像时可能会传递“无”,这就是它出错的原因。我仍然没有解决方案。任何信息都有帮助。
以下是完整错误:
Traceback (most recent call last):
File "D:\Program Files\Parser Python\Test.py", line 33, in <module>
for img in divImage.findAll('img'):
AttributeError: 'NoneType' object has no attribute 'findAll'
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
from selenium.common.exceptions import TimeoutException
import os
firefox_capabilities = DesiredCapabilities.FIREFOX
firefox_capabilities['marionette'] = True
firefox_capabilities['binary'] = 'C:\Program Files (x86)\Mozilla Firefox\firefox.exe'
os.environ["PATH"] += "C:\Python27\Lib\site-packages\selenium-2.53.6-py2.7.egg\selenium"
#binary = FirefoxBinary('C:\Program Files (x86)\Mozilla Firefox\firefox.exe')
driver = webdriver.Firefox(capabilities=firefox_capabilities)
# it takes forever to load the page, therefore we are setting a threshold
driver.set_page_load_timeout(5)
try:
driver.get("http://readcomiconline.to/Comic/Flashpoint/Issue-1?id=19295&readType=1")
except TimeoutException:
# never ignore exceptions silently in real world code
pass
soup2 = BeautifulSoup(driver.page_source, 'html.parser')
divImage = soup2.find('div', {"id": "divImage"})
#divImage = soup2.find('div', {"id": "containerRoot"})
# close the browser
driver.close()
for img in divImage.findAll('img'):
print img.get('src')
答案 0 :(得分:1)
错误表示divImage
为None
,这意味着在解析的HTML中找不到div
id="divImage"
元素。
您应首先等待所需元素出现在页面上,然后才能获取页面源并对其进行解析。这可以通过WebDriverWait
:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
# ...
driver.get("http://readcomiconline.to/Comic/Flashpoint/Issue-1?id=19295&readType=1")
wait = WebDriverWait(driver, 10)
wait.until(EC.presence_of_element_located((By.ID, "divImage")))
soup2 = BeautifulSoup(driver.page_source, 'html.parser')
请注意,要等待加载所有图像,您应该不断地将页面滚动到页脚,直到所有图像都被加载,实现:
driver.get("http://readcomiconline.to/Comic/Flashpoint/Issue-1?id=19295&readType=1")
wait.until(EC.presence_of_element_located((By.ID, "divImage")))
footer = driver.find_element_by_id("footer")
while True:
# scroll to the footer
driver.execute_script("arguments[0].scrollIntoView();", footer)
time.sleep(0.5)
# check if all images are loaded
if all(img.get_attribute("src") for img in driver.find_elements_by_css_selector("#divImage p img")):
break
不要忘记import time
。