我有一个网络抓取Django应用,该应用使用Selenium python lib和geckodriver从https://sellercentral.amazon.com/收集数据。它使用代理。该应用程序可以在我的本地Windows计算机上正常运行,但是在Ubuntu服务器上,出现以下错误:
.py", line 318, in execute
response = self.command_executor.execute(driver_command, params)
File "/usr/local/lib/python3.5/dist-packages/selenium/webdriver/remote/remote_connection.py", line 472, in execute
return self._request(command_info[0], url, body=data)
File "/usr/local/lib/python3.5/dist-packages/selenium/webdriver/remote/remote_connection.py", line 495, in _request
self._conn.request(method, parsed_url.path, body, headers)
File "/usr/lib/python3.5/http/client.py", line 1106, in request
self._send_request(method, url, body, headers)
File "/usr/lib/python3.5/http/client.py", line 1151, in _send_request
self.endheaders(body)
File "/usr/lib/python3.5/http/client.py", line 1102, in endheaders
self._send_output(message_body)
File "/usr/lib/python3.5/http/client.py", line 936, in _send_output
self.send(message_body)
File "/usr/lib/python3.5/http/client.py", line 908, in send
self.sock.sendall(data)
BrokenPipeError: [Errno 32] Broken pipe
硒的配置如下:
PROXY_PORT = '3128'
PROXY_HOST = '178.128.186.103'
fp = webdriver.FirefoxProfile()
fp.set_preference("network.proxy.type", 1)
fp.set_preference("network.proxy.http", PROXY_HOST)
fp.set_preference("network.proxy.http_port", int(PROXY_PORT))
fp.set_preference("network.proxy.https", PROXY_HOST)
fp.set_preference("network.proxy.https_port", int(PROXY_PORT))
fp.set_preference("network.proxy.ssl", PROXY_HOST)
fp.set_preference("network.proxy.ssl_port", int(PROXY_PORT))
fp.set_preference("network.proxy.ftp", PROXY_HOST)
fp.set_preference("network.proxy.ftp_port", int(PROXY_PORT))
fp.set_preference("network.proxy.socks", PROXY_HOST)
fp.set_preference("network.proxy.socks_port", int(PROXY_PORT))
# fp.set_preference("general.useragent.override", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.75.14 (KHTML, like Gecko) Version/7.0.3 Safari/7046A194A")
fp.update_preferences()
gecko = os.path.join(os.path.dirname(os.path.realpath(__file__)), 'geckodriver')
options = Options()
options.add_argument('--headless')