我正在使用此代码从链接https://website.grader.com/results/www.dubizzle.com中删除一些数据。因为带有标签的实际脚本我想在加载15秒后提取负载,有人建议我使用selemuim来引入代码延迟。因此我使用此代码
代码如下
#!/usr/bin/python
import urllib
import time
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from bs4 import BeautifulSoup
from dateutil.parser import parse
from datetime import timedelta
import MySQLdb
import re
import pdb
import sys
import string
driver = webdriver.Firefox()
driver.get('https://website.grader.com/results/dubizzle.com')
time.sleep(25)
html = driver.page_source
soup = BeautifulSoup(html)
# print soup
Sizeofweb=""
try:
Sizeofweb= soup.find('span', {'data-reactid': ".0.0.3.0.0.3.$0.1.1.0"}).text
print Sizeofweb.get_text().encode("utf-8")
except StandardError as e:
converted_date="Error was {0}".format(e)
print converted_date
我提取的html部分如下
Snap:https://www.dropbox.com/s/7dwbaiyizwa36m6/5.PNG?dl=0
<div class="result-value" data-reactid=".0.0.3.0.0.3.$0.1.1">
<span data-reactid=".0.0.3.0.0.3.$0.1.1.0">1.1</span>
<span class="result-value-unit" data-reactid=".0.0.3.0.0.3.$0.1.1.1">MB</span>
</div>
我通过从这里下载geckodriver并将其解压缩到/ home目录然后给它一个路径导出PATH = $ PATH:/ home / geckodriver我按照@Ahn Smith这里的人的推荐安装geckodriver
现在,当我运行程序时,它会出现此错误
Traceback (most recent call last):
File "ahmed.py", line 17, in <module>
driver = webdriver.Firefox()
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/firefox/webdriver.py", line 140, in __init__
self.service.start()
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/common/service.py", line 74, in start
stdout=self.log_file, stderr=self.log_file)
File "/usr/lib/python2.7/subprocess.py", line 710, in __init__
errread, errwrite)
File "/usr/lib/python2.7/subprocess.py", line 1327, in _execute_child
raise child_exception
OSError: [Errno 20] Not a directory
答案 0 :(得分:1)
有两种方法可以将Selenium指向适当的webdriver。您可以将其作为参数传递:
driver = webdriver.Firefox(executable_path='/path/to/geckodriver')
或者您可以创建包含PATH
的本地shell变量:
$ export PATH=$PATH:/path/to/
我认为您的问题是您将PATH
变量导出到geckodriver而不是导出包含它的文件夹。