导入项目使我的蜘蛛不可靠?

时间:2016-12-30 11:59:57

标签: scrapy

我的问题是,由于某种原因,scrapy框架似乎找不到蜘蛛。我有这只蜘蛛:

import scrapy
from scrapy.http import Request
from propreties.items import htmltableitem


class SymbolspiderSpider(scrapy.Spider):
    name = "symbolspider"

    def start_requests(self):
        for i in range(0,10):
            yield Request( 'https://www.google.com/finance?q=%27&restype=company&noIL=1&num=50&ei=VPBjWJHKK9S7U6_dmvgM&start='+str(i) )

    def parse(self, response):
        l=ItemLoader(item=htmltableitem(), response=response)
        l.add_xpath('htmltable', ".//*[@id='gf-viewc']/div/div[2]/form/table/tbody/child::*")
        return l.load_item()

当我运行scrapy crawl symbolspider -o output.csv时出错:

Traceback (most recent call last):
  File "/usr/bin/scrapy", line 11, in <module>
    sys.exit(execute())
  File "/usr/lib/python3.5/site-packages/scrapy/cmdline.py", line 142, in execute
    _run_print_help(parser, _run_command, cmd, args, opts)
  File "/usr/lib/python3.5/site-packages/scrapy/cmdline.py", line 88, in _run_print_help
    func(*a, **kw)
  File "/usr/lib/python3.5/site-packages/scrapy/cmdline.py", line 149, in _run_command
    cmd.run(args, opts)
  File "/usr/lib/python3.5/site-packages/scrapy/commands/crawl.py", line 57, in run
    self.crawler_process.crawl(spname, **opts.spargs)
  File "/usr/lib/python3.5/site-packages/scrapy/crawler.py", line 162, in crawl
    crawler = self.create_crawler(crawler_or_spidercls)
  File "/usr/lib/python3.5/site-packages/scrapy/crawler.py", line 190, in create_crawler
    return self._create_crawler(crawler_or_spidercls)
  File "/usr/lib/python3.5/site-packages/scrapy/crawler.py", line 194, in _create_crawler
    spidercls = self.spider_loader.load(spidercls)
  File "/usr/lib/python3.5/site-packages/scrapy/spiderloader.py", line 51, in load
    raise KeyError("Spider not found: {}".format(spider_name))
KeyError: 'Spider not found: symbolspider'

有趣的是,当我删除行from propreties.items import htmltableitem时,它现在检测到蜘蛛,但由于项目调用未知,因此只会产生错误。发生了什么事?

修改:scrapy list返回

/usr/lib/python3.5/site-packages/scrapy/spiderloader.py:37: RuntimeWarning: 
Traceback (most recent call last):
  File "/usr/lib/python3.5/site-packages/scrapy/spiderloader.py", line 31, in _load_all_spiders
    for module in walk_modules(name):
  File "/usr/lib/python3.5/site-packages/scrapy/utils/misc.py", line 71, in walk_modules
    submod = import_module(fullpath)
  File "/usr/lib/python3.5/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 986, in _gcd_import
  File "<frozen importlib._bootstrap>", line 969, in _find_and_load
  File "<frozen importlib._bootstrap>", line 958, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 673, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 665, in exec_module
  File "<frozen importlib._bootstrap>", line 222, in _call_with_frames_removed
  File "/home/volt/projects/scrapy/googlefinance/googlefinance/spiders/symbolspider.py", line 4, in <module>
    from propreties.items import htmltableitem
ImportError: No module named 'propreties'
Could not load spiders from module 'googlefinance.spiders'. Check SPIDER_MODULES setting
  warnings.warn(msg, RuntimeWarning)

tree

├── googlefinance
│   ├── __init__.py
│   ├── items.py
│   ├── middlewares.py
│   ├── pipelines.py
│   ├── __pycache__
│   │   ├── __init__.cpython-35.pyc
│   │   └── settings.cpython-35.pyc
│   ├── settings.py
│   └── spiders
│       ├── dataspider.py
│       ├── __init__.py
│       ├── __pycache__
│       │   ├── dataspider.cpython-35.pyc
│       │   ├── __init__.cpython-35.pyc
│       │   └── symbolspider.cpython-35.pyc
│       └── symbolspider.py
├── logs
├── output
│   └── htmltables.csv
└── scrapy.cfg

1 个答案:

答案 0 :(得分:1)

应该是

from googlefinance.items import htmltableitem

而不是

from propreties.items import htmltableitem

首先,您创建了名为propreties的Scrapy项目,然后将目录重命名为googlefinance,而不进行任何其他源代码更改。

替换代码中propreties的所有条目,并检查scrapy.cfg内容。