如何处理httplib.BadStatusLine:''

时间:2017-01-13 09:43:34

标签: python-2.7 selenium beautifulsoup

我正在使用Python,BeautifulSoup和Selenium抓取一些网络数据。我也在使用PyVirtualDisplay,因此我不需要显示。

它可以从我的笔记本电脑中完美运行,但是当我从服务器运行时,我收到以下错误:

httplib.BadStatusLine: ''

第二次抓页时,我得到了这个。它现在一直都在做。有什么问题?

编辑

已添加代码:

import requests, bs4
import csv
import re
import datetime
import time
import os 

from contextlib import closing
from selenium import webdriver
from selenium.webdriver import Firefox # pip install selenium
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.keys import Keys
from pyvirtualdisplay import Display

display = Display(visible=0, size=(1500, 1200))
display.start()

url_base = "https://www.seek.com.au/jobs?page="

# open web browser and login
binary = FirefoxBinary('/home/firefox/firefox/firefox')
driver = webdriver.Firefox(firefox_binary=binary)

overlap = False
page = 0

while not overlap:
    page += 1
    driver.get(url_base+str(page))

    ...

这是追溯:

Traceback (most recent call last):
  File "manage.py", line 22, in <module>
    execute_from_command_line(sys.argv)
  File "/var/www/matt/env/local/lib/python2.7/site-packages/django/core/management/__init__.py", line 367, in execute_from_command_line
    utility.execute()
  File "/var/www/matt/env/local/lib/python2.7/site-packages/django/core/management/__init__.py", line 359, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/var/www/matt/env/local/lib/python2.7/site-packages/django/core/management/base.py", line 294, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/var/www/matt/env/local/lib/python2.7/site-packages/django/core/management/base.py", line 345, in execute
    output = self.handle(*args, **options)
  File "/var/www/matt/matt/management/commands/mattv3.py", line 109, in handle
    driver.get(url_base+str(page))
  File "/var/www/matt/env/local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 245, in get
    self.execute(Command.GET, {'url': url})
  File "/var/www/matt/env/local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 231, in execute
    response = self.command_executor.execute(driver_command, params)
  File "/var/www/matt/env/local/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 395, in execute
    return self._request(command_info[0], url, body=data)
  File "/var/www/matt/env/local/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 426, in _request
    resp = self._conn.getresponse()
  File "/usr/lib/python2.7/httplib.py", line 1136, in getresponse
    response.begin()
  File "/usr/lib/python2.7/httplib.py", line 453, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python2.7/httplib.py", line 417, in _read_status
    raise BadStatusLine(line)
httplib.BadStatusLine: ''

1 个答案:

答案 0 :(得分:0)

我在一个非常小的服务器上运行它(512MB,20GB SSD)。我增加了它,它运行正常。如果有人能向我解释这个问题,我很乐意理解。