我正在使用此代码从链接 https://website.grader.com/results/www.dubizzle.com 中删除部分数据。因为带有我要在15秒加载后提取的标签的实际脚本,有人建议我selenuim在代码中引入延迟。因此我使用此代码
代码如下
#!/usr/bin/python
import urllib
import time
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from bs4 import BeautifulSoup
from dateutil.parser import parse
from datetime import timedelta
import MySQLdb
import re
import pdb
import sys
import string
driver = webdriver.Firefox()
driver.get('https://website.grader.com/results/dubizzle.com')
time.sleep(25)
html = driver.page_source
soup = BeautifulSoup(html)
# print soup
Sizeofweb=""
try:
Sizeofweb= soup.find('span', {'data-reactid': ".0.0.3.0.0.3.$0.1.1.0"}).text
print Sizeofweb.get_text().encode("utf-8")
except StandardError as e:
converted_date="Error was {0}".format(e)
print converted_date
我提取的html部分如下
Snap:https://www.dropbox.com/s/7dwbaiyizwa36m6/5.PNG?dl=0
<div class="result-value" data-reactid=".0.0.3.0.0.3.$0.1.1">
<span data-reactid=".0.0.3.0.0.3.$0.1.1.0">1.1</span>
<span class="result-value-unit" data-reactid=".0.0.3.0.0.3.$0.1.1.1">MB</span>
</div>
我得到的错误是:
Traceback (most recent call last):
File "ahmed.py", line 20, in <module>
driver = webdriver.Firefox()
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/firefox/webdriver.py", line 140, in __init__
self.service.start()
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/common/service.py", line 81, in start
os.path.basename(self.path), self.start_error_message)
selenium.common.exceptions.WebDriverException: Message: 'geckodriver' executable needs to be in PATH.
Exception AttributeError: "'Service' object has no attribute 'process'" in <bound method Service.__del__ of <selenium.webdriver.firefox.service.Service object at 0x7f65a1ccbe10>> ignored
答案 0 :(得分:0)
您没有安装当前的FireFox webdriver或在您的路径中,在启动浏览器之前代码出错了
driver = webdriver.Firefox()
要解决此问题,您需要安装(或重新安装)firefox驱动程序并将其添加到您的路径中。