我已经根据docs安装并设置了scrapy-splash。我的蜘蛛看起来像这样:
import scrapy
from testencode.items import TestencodeItem
from scrapy_splash import SplashRequest
class Test1Spider(scrapy.Spider):
name = 'test1'
allowed_domains = ['lg.com']
def start_requests(self):
yield SplashRequest(
url = 'https://www.lg.com/it',
callback=self.parse,
)
def parse(self, response):
i = TestencodeItem()
i['title'] = response.xpath('//head/title//text()').extract_first()
return i
我激活了启动图像(在端口8050上),并且我的浏览器可以访问127.0.0.1/8050。但是当我运行Spider时,它抛出一个错误,说它无法连接到端口6800(即scrapyd端口)。
C:\Users\user1\Documents\Projects\dev\testencode>scrapy crawl test1
Traceback (most recent call last):
File "c:\program files (x86)\python37-32\lib\site-packages\urllib3\connection.py", line 159, in _new_conn
(self._dns_host, self.port), self.timeout, **extra_kw)
File "c:\program files (x86)\python37-32\lib\site-packages\urllib3\util\connection.py", line 80, in create_connection
raise err
File "c:\program files (x86)\python37-32\lib\site-packages\urllib3\util\connection.py", line 70, in create_connection
sock.connect(sa)
ConnectionRefusedError: [WinError 10061] No connection could be made because the target machine actively refused it
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "c:\program files (x86)\python37-32\lib\site-packages\urllib3\connectionpool.py", line 600, in urlopen
chunked=chunked)
File "c:\program files (x86)\python37-32\lib\site-packages\urllib3\connectionpool.py", line 354, in _make_request
conn.request(method, url, **httplib_request_kw)
File "c:\program files (x86)\python37-32\lib\http\client.py", line 1229, in request
self._send_request(method, url, body, headers, encode_chunked)
File "c:\program files (x86)\python37-32\lib\http\client.py", line 1275, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "c:\program files (x86)\python37-32\lib\http\client.py", line 1224, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "c:\program files (x86)\python37-32\lib\http\client.py", line 1016, in _send_output
self.send(msg)
File "c:\program files (x86)\python37-32\lib\http\client.py", line 956, in send
self.connect()
File "c:\program files (x86)\python37-32\lib\site-packages\urllib3\connection.py", line 181, in connect
conn = self._new_conn()
File "c:\program files (x86)\python37-32\lib\site-packages\urllib3\connection.py", line 168, in _new_conn
self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x05092590>: Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "c:\program files (x86)\python37-32\lib\site-packages\requests\adapters.py", line 449, in send
timeout=timeout
File "c:\program files (x86)\python37-32\lib\site-packages\urllib3\connectionpool.py", line 638, in urlopen
_stacktrace=sys.exc_info()[2])
File "c:\program files (x86)\python37-32\lib\site-packages\urllib3\util\retry.py", line 398, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='localhost', port=6800): Max retries exceeded with url: /schedule.json (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x05092590>: Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "c:\program files (x86)\python37-32\lib\runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "c:\program files (x86)\python37-32\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "C:\Program Files (x86)\Python37-32\Scripts\scrapy.exe\__main__.py", line 9, in <module>
File "c:\program files (x86)\python37-32\lib\site-packages\scrapy\cmdline.py", line 149, in execute
cmd.crawler_process = CrawlerProcess(settings)
File "c:\program files (x86)\python37-32\lib\site-packages\scrapy\crawler.py", line 249, in __init__
super(CrawlerProcess, self).__init__(settings)
File "c:\program files (x86)\python37-32\lib\site-packages\scrapy\crawler.py", line 137, in __init__
self.spider_loader = _get_spider_loader(settings)
File "c:\program files (x86)\python37-32\lib\site-packages\scrapy\crawler.py", line 336, in _get_spider_loader
return loader_cls.from_settings(settings.frozencopy())
File "c:\program files (x86)\python37-32\lib\site-packages\scrapy\spiderloader.py", line 61, in from_settings
return cls(settings)
File "c:\program files (x86)\python37-32\lib\site-packages\scrapy\spiderloader.py", line 25, in __init__
self._load_all_spiders()
File "c:\program files (x86)\python37-32\lib\site-packages\scrapy\spiderloader.py", line 47, in _load_all_spiders
for module in walk_modules(name):
File "c:\program files (x86)\python37-32\lib\site-packages\scrapy\utils\misc.py", line 71, in walk_modules
submod = import_module(fullpath)
File "c:\program files (x86)\python37-32\lib\importlib\__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
File "<frozen importlib._bootstrap>", line 983, in _find_and_load
File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 728, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "C:\Users\user1\Documents\Projects\dev\testencode\testencode\spiders\atrun.py", line 3, in <module>
taks = scrapyd.schedule('default', 'troler')
File "c:\program files (x86)\python37-32\lib\site-packages\scrapyd_api\wrapper.py", line 188, in schedule
json = self.client.post(url, data=data, timeout=self.timeout)
File "c:\program files (x86)\python37-32\lib\site-packages\requests\sessions.py", line 572, in post
return self.request('POST', url, data=data, json=json, **kwargs)
File "c:\program files (x86)\python37-32\lib\site-packages\scrapyd_api\client.py", line 37, in request
response = super(Client, self).request(*args, **kwargs)
File "c:\program files (x86)\python37-32\lib\site-packages\requests\sessions.py", line 524, in request
resp = self.send(prep, **send_kwargs)
File "c:\program files (x86)\python37-32\lib\site-packages\requests\sessions.py", line 637, in send
r = adapter.send(request, **kwargs)
File "c:\program files (x86)\python37-32\lib\site-packages\requests\adapters.py", line 516, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=6800): Max retries exceeded with url: /schedule.json (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x05092590>: Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it'))
我试图通过激活scrapyd解决它。但是,所有这些操作都是在激活蜘蛛网后使cmd窗口冻结。我有点不解,因为官方文档中没有提到需要使用scrapyd来使用启动程序,而且在与启动程序相关的设置中我没有引用端口6800。