我正在使用Python,BeautifulSoup和Selenium抓取一些网络数据。我也在使用PyVirtualDisplay,因此我不需要显示。
它可以从我的笔记本电脑中完美运行,但是当我从服务器运行时,我收到以下错误:
httplib.BadStatusLine: ''
第二次抓页时,我得到了这个。它现在一直都在做。有什么问题?
编辑
已添加代码:
import requests, bs4
import csv
import re
import datetime
import time
import os
from contextlib import closing
from selenium import webdriver
from selenium.webdriver import Firefox # pip install selenium
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.keys import Keys
from pyvirtualdisplay import Display
display = Display(visible=0, size=(1500, 1200))
display.start()
url_base = "https://www.seek.com.au/jobs?page="
# open web browser and login
binary = FirefoxBinary('/home/firefox/firefox/firefox')
driver = webdriver.Firefox(firefox_binary=binary)
overlap = False
page = 0
while not overlap:
page += 1
driver.get(url_base+str(page))
...
这是追溯:
Traceback (most recent call last):
File "manage.py", line 22, in <module>
execute_from_command_line(sys.argv)
File "/var/www/matt/env/local/lib/python2.7/site-packages/django/core/management/__init__.py", line 367, in execute_from_command_line
utility.execute()
File "/var/www/matt/env/local/lib/python2.7/site-packages/django/core/management/__init__.py", line 359, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/var/www/matt/env/local/lib/python2.7/site-packages/django/core/management/base.py", line 294, in run_from_argv
self.execute(*args, **cmd_options)
File "/var/www/matt/env/local/lib/python2.7/site-packages/django/core/management/base.py", line 345, in execute
output = self.handle(*args, **options)
File "/var/www/matt/matt/management/commands/mattv3.py", line 109, in handle
driver.get(url_base+str(page))
File "/var/www/matt/env/local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 245, in get
self.execute(Command.GET, {'url': url})
File "/var/www/matt/env/local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 231, in execute
response = self.command_executor.execute(driver_command, params)
File "/var/www/matt/env/local/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 395, in execute
return self._request(command_info[0], url, body=data)
File "/var/www/matt/env/local/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 426, in _request
resp = self._conn.getresponse()
File "/usr/lib/python2.7/httplib.py", line 1136, in getresponse
response.begin()
File "/usr/lib/python2.7/httplib.py", line 453, in begin
version, status, reason = self._read_status()
File "/usr/lib/python2.7/httplib.py", line 417, in _read_status
raise BadStatusLine(line)
httplib.BadStatusLine: ''
答案 0 :(得分:0)
我在一个非常小的服务器上运行它(512MB,20GB SSD)。我增加了它,它运行正常。如果有人能向我解释这个问题,我很乐意理解。