我的问题是,由于某种原因,scrapy框架似乎找不到蜘蛛。我有这只蜘蛛:
import scrapy
from scrapy.http import Request
from propreties.items import htmltableitem
class SymbolspiderSpider(scrapy.Spider):
name = "symbolspider"
def start_requests(self):
for i in range(0,10):
yield Request( 'https://www.google.com/finance?q=%27&restype=company&noIL=1&num=50&ei=VPBjWJHKK9S7U6_dmvgM&start='+str(i) )
def parse(self, response):
l=ItemLoader(item=htmltableitem(), response=response)
l.add_xpath('htmltable', ".//*[@id='gf-viewc']/div/div[2]/form/table/tbody/child::*")
return l.load_item()
当我运行scrapy crawl symbolspider -o output.csv
时出错:
Traceback (most recent call last):
File "/usr/bin/scrapy", line 11, in <module>
sys.exit(execute())
File "/usr/lib/python3.5/site-packages/scrapy/cmdline.py", line 142, in execute
_run_print_help(parser, _run_command, cmd, args, opts)
File "/usr/lib/python3.5/site-packages/scrapy/cmdline.py", line 88, in _run_print_help
func(*a, **kw)
File "/usr/lib/python3.5/site-packages/scrapy/cmdline.py", line 149, in _run_command
cmd.run(args, opts)
File "/usr/lib/python3.5/site-packages/scrapy/commands/crawl.py", line 57, in run
self.crawler_process.crawl(spname, **opts.spargs)
File "/usr/lib/python3.5/site-packages/scrapy/crawler.py", line 162, in crawl
crawler = self.create_crawler(crawler_or_spidercls)
File "/usr/lib/python3.5/site-packages/scrapy/crawler.py", line 190, in create_crawler
return self._create_crawler(crawler_or_spidercls)
File "/usr/lib/python3.5/site-packages/scrapy/crawler.py", line 194, in _create_crawler
spidercls = self.spider_loader.load(spidercls)
File "/usr/lib/python3.5/site-packages/scrapy/spiderloader.py", line 51, in load
raise KeyError("Spider not found: {}".format(spider_name))
KeyError: 'Spider not found: symbolspider'
有趣的是,当我删除行from propreties.items import htmltableitem
时,它现在检测到蜘蛛,但由于项目调用未知,因此只会产生错误。发生了什么事?
修改:scrapy list
返回
/usr/lib/python3.5/site-packages/scrapy/spiderloader.py:37: RuntimeWarning:
Traceback (most recent call last):
File "/usr/lib/python3.5/site-packages/scrapy/spiderloader.py", line 31, in _load_all_spiders
for module in walk_modules(name):
File "/usr/lib/python3.5/site-packages/scrapy/utils/misc.py", line 71, in walk_modules
submod = import_module(fullpath)
File "/usr/lib/python3.5/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 986, in _gcd_import
File "<frozen importlib._bootstrap>", line 969, in _find_and_load
File "<frozen importlib._bootstrap>", line 958, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 673, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 665, in exec_module
File "<frozen importlib._bootstrap>", line 222, in _call_with_frames_removed
File "/home/volt/projects/scrapy/googlefinance/googlefinance/spiders/symbolspider.py", line 4, in <module>
from propreties.items import htmltableitem
ImportError: No module named 'propreties'
Could not load spiders from module 'googlefinance.spiders'. Check SPIDER_MODULES setting
warnings.warn(msg, RuntimeWarning)
tree
:
├── googlefinance
│ ├── __init__.py
│ ├── items.py
│ ├── middlewares.py
│ ├── pipelines.py
│ ├── __pycache__
│ │ ├── __init__.cpython-35.pyc
│ │ └── settings.cpython-35.pyc
│ ├── settings.py
│ └── spiders
│ ├── dataspider.py
│ ├── __init__.py
│ ├── __pycache__
│ │ ├── dataspider.cpython-35.pyc
│ │ ├── __init__.cpython-35.pyc
│ │ └── symbolspider.cpython-35.pyc
│ └── symbolspider.py
├── logs
├── output
│ └── htmltables.csv
└── scrapy.cfg
答案 0 :(得分:1)
应该是
from googlefinance.items import htmltableitem
而不是
from propreties.items import htmltableitem
首先,您创建了名为propreties
的Scrapy项目,然后将目录重命名为googlefinance
,而不进行任何其他源代码更改。
替换代码中propreties
的所有条目,并检查scrapy.cfg
内容。