不要做Scrapy蜘蛛

时间:2016-07-17 11:39:56

标签: python python-3.x ubuntu scrapy lxml

我解析了一个网站,我有一只蜘蛛:

# -*- coding: utf-8 -*-



from quoka.items import QuokaItem
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor
from scrapy.loader.processors import TakeFirst
from scrapy.loader import XPathItemLoader
from scrapy.selector import HtmlXPathSelector

class QuokaLoader(XPathItemLoader):
    default_output_processor = TakeFirst()


class QuokaSpider(CrawlSpider):

    name = "quoka"
    allowed_domains = ["quoka.de"]
    start_urls = ["http://www.quoka.de/immobilien/bueros-gewerbeflaechen/"]

rules = (
         Rule(LinkExtractor(allow=('kleinanzeigen/cat_27_2710_ct_0_page_')), follow=True),
         Rule(LinkExtractor(allow=('immobilien/bueros-gewerbeflaechen/')), callback='parse_item'),
         )

def parse_item(self, response):
    hxs = HtmlXPathSelector(response)
    l = QuokaLoader(QuokaItem(), hxs)

    #
    l.add_xpath('date',response.xpath("/html/body/div[3]/div[2]/div[1]/main/div[8]/div/div[2]/strong/span/text()").extract())
    l.add_xpath('cost',response.xpath("/html/body/div[3]/div[2]/div[1]/main/div[8]/div/div[3]/div[2]/div[2]/text()").extract())
   # l.add_value('url', response.url)

    return l.load_item()

输入命令:sudo scrapy crawl quoka_spider.py

但我有这个神秘的错误:

/home/gadzhibala/PycharmProjects/quoka/quoka/spiders/quoka_spider.py:14: ScrapyDeprecationWarning: quoka.spiders.quoka_spider.QuokaLoader inherits from deprecated class scrapy.loader.XPathItemLoader, please inherit from scrapy.loader.ItemLoader. (warning only on first subclass, there may be others)
class QuokaLoader(XPathItemLoader):
2016-07-17 14:07:01 [scrapy] INFO: Scrapy 1.1.1 started (bot: quoka)
2016-07-17 14:07:01 [scrapy] INFO: Overridden settings: {'BOT_NAME': 'quoka', 'SPIDER_MODULES': ['quoka.spiders'], 'ROBOTSTXT_OBEY': True, 'NEWSPIDER_MODULE': 'quoka.spiders'}
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/scrapy/spiderloader.py", line 41, in load
return self._spiders[spider_name]
KeyError: 'quoka_spider.py'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/bin/scrapy", line 11, in <module>
sys.exit(execute())
File "/usr/local/lib/python3.5/dist-packages/scrapy/cmdline.py", line 142, in execute
_run_print_help(parser, _run_command, cmd, args, opts)
 File "/usr/local/lib/python3.5/dist-packages/scrapy/cmdline.py", line 88, in _run_print_help
func(*a, **kw)
File "/usr/local/lib/python3.5/dist-packages/scrapy/cmdline.py", line 149, in _run_command
cmd.run(args, opts)
File "/usr/local/lib/python3.5/dist-packages/scrapy/commands/crawl.py", line 57, in run
self.crawler_process.crawl(spname, **opts.spargs)
File "/usr/local/lib/python3.5/dist-packages/scrapy/crawler.py", line 162, in crawl
crawler = self.create_crawler(crawler_or_spidercls)
File "/usr/local/lib/python3.5/dist-packages/scrapy/crawler.py", line 190, in create_crawler
return self._create_crawler(crawler_or_spidercls)
File "/usr/local/lib/python3.5/dist-packages/scrapy/crawler.py", line 194, in _create_crawler
spidercls = self.spider_loader.load(spidercls)
File "/usr/local/lib/python3.5/dist-packages/scrapy/spiderloader.py", line 43, in load
raise KeyError("Spider not found: {}".format(spider_name))
KeyError: 'Spider not found: quoka_spider.py'

我使用的是Ubuntu 16.04,python3.5。安装scraby pip3安装Scraby。我重新安装了Scraby,但没有成功。 这是怎么解决的?

1 个答案:

答案 0 :(得分:0)

您应该使用蜘蛛的name属性,
而不是遵循:

sudo scrapy crawl quoka_spider.py

输入:

scrapy crawl quoka