我一直在运行我的scrapy代码,直到在调用“ scrapy crawl file”时出现有关找不到模块的错误为止。我不记得更改任何重要内容,并且这个错误无处不在。
我重新安装了scrapy,现在出现了新错误:
2019-05-27 17:39:19 [scrapy.core.engine] INFO: Spider opened
Unhandled error in Deferred:
2019-05-27 17:39:19 [twisted] CRITICAL: Unhandled error in Deferred:
Traceback (most recent call last):
File "c:\users\Me\virtual_workspace\lib\site-packages\scrapy\crawler.py", line 172, in crawl
return self._crawl(crawler, *args, **kwargs)
File "c:\users\Me\virtual_workspace\lib\site-packages\scrapy\crawler.py", line 176, in _crawl
d = crawler.crawl(*args, **kwargs)
File "c:\users\Me\virtual_workspace\lib\site-packages\twisted\internet\defer.py", line 1613, in unwindGenerator
return _cancellableInlineCallbacks(gen)
File "c:\users\Me\virtual_workspace\lib\site-packages\twisted\internet\defer.py", line 1529, in _cancellableInlineCallbacks
_inlineCallbacks(None, g, status)
--- <exception caught here> ---
File "c:\users\Me\virtual_workspace\lib\site-packages\twisted\internet\defer.py", line 1418, in _inlineCallbacks
result = g.send(result)
File "c:\users\Me\virtual_workspace\lib\site-packages\scrapy\crawler.py", line 82, in crawl
yield self.engine.open_spider(self.spider, start_requests)
builtins.ImportError: DLL load failed: The specified module could not be found.
2019-05-27 17:39:19 [twisted] CRITICAL:
Traceback (most recent call last):
File "c:\users\Me\virtual_workspace\lib\site-packages\twisted\internet\defer.py", line 1418, in _inlineCallbacks
result = g.send(result)
File "c:\users\Me\virtual_workspace\lib\site-packages\scrapy\crawler.py", line 82, in crawl
yield self.engine.open_spider(self.spider, start_requests)
ImportError: DLL load failed: The specified module could not be found.
我试图查看文件目录,并且crawler.py仍然存在。其他一些帖子告诉我安装pywin32,但是我已经安装了pywin32,所以我重新安装无济于事。我什至将基本构造函数复制到了我的构造函数中,但仍然无法正常工作。任何帮助表示赞赏。
我的简化代码:
import scrapy
from scrapy_splash import SplashRequest
import requests
import xml.etree.ElementTree as ET
import math
from datetime import date
from collections import deque
class mySpider(scrapy.Spider):
name = 'myScraper'
#requests
def start_requests(self):
urls = [
'https://www.msn.com/en-ca/'
]
self.link_queue = deque()
self.headers = {'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36 OPR/60.0.3255.109'}
for url in urls:
yield SplashRequest(url=url, callback=self.parse, endpoint = 'render.html', args = {'wait': 7}, headers = self.headers)
#response
def parse(self, response):
pass