Scrapy-Scraper不运行

时间:2014-12-10 16:40:24

标签: python-2.7 web-scraping scrapy screen-scraping scrapy-spider

我可以使用Beautiful Soup和Mechanized运行python,但出于某种原因,当我尝试使用Spray-Scraper时它只是不起作用。以下是我尝试使用教程测试刮刀时会发生什么的示例:

项目名称& BOT name =“tutorial”

以下脚本是我使用的items.py和settings.py。

items.py

import scrapy

class DmozSpider(scrapy.Spider):
    name = "dmoz"
    allowed_domains = ["dmoz.org"]
    start_urls = [
        "http://www.dmoz.org/Computers/Programming/Languages/Python/Books/",
        "http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/"
    ]

    def parse(self, response):
        filename = response.url.split("/")[-2]
        with open(filename, 'wb') as f:
            f.write(response.body)

settings.py

BOT_NAME = 'tutorial'

SPIDER_MODULES = ['tutorial.spiders']
NEWSPIDER_MODULE = 'tutorial.spiders'

CMD

C:\Users\Turbo>scrapy startproject tutorial
New Scrapy project 'tutorial' created in:
    C:\Users\Turbo\tutorial

You can start your first spider with:
    cd tutorial
    scrapy genspider example example.com

C:\Users\Turbo>cd tutorial

C:\Users\Turbo\tutorial>scrapy crawl dmoz
Traceback (most recent call last):
  File "C:\Python27\Scripts\scrapy-script.py", line 9, in <module>
    load_entry_point('scrapy==0.24.4', 'console_scripts', 'scrapy')()
  File "C:\Python27\lib\site-packages\scrapy-0.24.4-py2.7.egg\scrapy\cmdline.py"
, line 143, in execute
    _run_print_help(parser, _run_command, cmd, args, opts)
  File "C:\Python27\lib\site-packages\scrapy-0.24.4-py2.7.egg\scrapy\cmdline.py"
, line 89, in _run_print_help
    func(*a, **kw)
  File "C:\Python27\lib\site-packages\scrapy-0.24.4-py2.7.egg\scrapy\cmdline.py"
, line 150, in _run_command
    cmd.run(args, opts)
  File "C:\Python27\lib\site-packages\scrapy-0.24.4-py2.7.egg\scrapy\commands\cr
awl.py", line 58, in run
    spider = crawler.spiders.create(spname, **opts.spargs)
  File "C:\Python27\lib\site-packages\scrapy-0.24.4-py2.7.egg\scrapy\spidermanag
er.py", line 44, in create
    raise KeyError("Spider not found: %s" % spider_name)
KeyError: 'Spider not found: dmoz'

1 个答案:

答案 0 :(得分:0)

问题在于您将蜘蛛放入items.py

相反,创建一个包spiders,在其中创建一个dmoz.py并将你的蜘蛛放入其中。

请参阅教程的Our first Spider段。