Scrapy抓取0页(0页/分钟)

时间:2015-06-20 19:16:26

标签: python scrapy

我对scrapy有疑问,它没有返回任何结果,我不知道问题在哪里

我使用的是Python版本:2.7.3(Windows 8.1 64位)

我的项目已由此命令scrapy startproject craigslist_sample

创建

项目目录:

craigslist_sample/
    scrapy.cfg           

    craigslist_sample/            
        __init__.py

        items.py          

        pipelines.py      

        settings.py       

        spiders/         
            __init__.py
              byub.py

我的蜘蛛文件(byub.py)

    import scrapy

class MySpider(scrapy.Spider):
        name = "craig"
        allowed_domains = ["craigslist.org"]
        start_urls = [
            "http://sfbay.craigslist.org/search/sfc/npo"    ]

        def parse(self, response):
          items = []
          for sel in response.xpath('//p//a[@class="hdrlnk"]'):
               item = CraigslistSampleItem()
               print(  sel.xpath('text()').extract())
               print (sel.xpath('@href').extract())

当我把F5看到我的数据打印时,我什么都没有显示

我的 init .py文件位于spiders文件夹中:

import sys
sys.path.append("../../craigslist_sample/")

我添加了路径../../craigslist_sample/来调用我的CraigslistSampleItem类

我的items.py文件

import scrapy


class CraigslistSampleItem(scrapy.Item):
    # define the fields for your item here like:
    title = scrapy.Field()
    link  = scrapy.Field()

我的日志文件:

2015-06-20 22:34:59 [scrapy] INFO: Scrapy 1.0.0 started (bot: craigslist_sample)
2015-06-20 22:34:59 [scrapy] INFO: Optional features available: ssl, http11
2015-06-20 22:34:59 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'craigslist_sample.spiders', 'SPIDER_MODULES': ['craigslist_sample.spiders'], 'LOG_STDOUT': True, 'LOG_FILE': '/tmp/scrapy_output.txt', 'BOT_NAME': 'craigslist_sample'}
2015-06-20 22:35:00 [scrapy] INFO: Enabled extensions: CloseSpider, TelnetConsole, LogStats, CoreStats, SpiderState
2015-06-20 22:35:00 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
2015-06-20 22:35:00 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2015-06-20 22:35:00 [scrapy] INFO: Enabled item pipelines: 
2015-06-20 22:35:00 [scrapy] INFO: Spider opened
2015-06-20 22:35:00 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2015-06-20 22:35:00 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
2015-06-20 22:35:02 [scrapy] DEBUG: Crawled (200) <GET http://sfbay.craigslist.org/search/sfc/npo> (referer: None)
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5083113578.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5083098605.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5083051162.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5083044559.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5083043239.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5083034151.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5082961277.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5082936118.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5082930994.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5082908649.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5082826886.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5082820427.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5082808607.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5082796023.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5082767892.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5082699233.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5082685178.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5082682792.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5082674781.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5082565558.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5082545852.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5082466564.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5082457151.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5082454103.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5082452290.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5082452087.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5082442715.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5082368243.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5082367400.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5082364446.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5082206212.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5082176091.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5082142295.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5081546128.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5081544083.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5081349969.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5081337282.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5081329478.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5081325271.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5081315033.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5081284397.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5081272495.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5081248716.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5081242306.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5081198308.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5081185072.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5081182362.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5081039111.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5081033894.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5081030919.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5080930010.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5080922969.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5080783300.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5080757424.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5080754908.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5080696793.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5080523544.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5080474373.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5079764803.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5079655298.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5079652979.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5079651750.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5079617063.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5079600458.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5079484883.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5079458099.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5079439949.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5079434763.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5079423265.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5079421733.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5079345334.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5079272799.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5079271027.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5079130762.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5079058773.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5078791191.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5078784316.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5078657036.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5078096040.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5078022877.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5078018145.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5077960434.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5077955778.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5077927644.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5077906229.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5077813126.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5077799125.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5077795848.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5077763673.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5077582518.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5077522272.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5077402309.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5077397915.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5077350438.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5077123591.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5076362090.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5076361296.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5076341213.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5076299050.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: [u'/sfc/npo/5076222757.html']
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [stdout] INFO: []
2015-06-20 22:35:02 [scrapy] INFO: Closing spider (finished)
2015-06-20 22:35:02 [scrapy] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 232,
 'downloader/request_count': 1,
 'downloader/request_method_count/GET': 1,
 'downloader/response_bytes': 15530,
 'downloader/response_count': 1,
 'downloader/response_status_count/200': 1,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2015, 6, 20, 20, 35, 2, 312000),
 'log_count/DEBUG': 2,
 'log_count/INFO': 209,
 'response_received_count': 1,
 'scheduler/dequeued': 1,
 'scheduler/dequeued/memory': 1,
 'scheduler/enqueued': 1,
 'scheduler/enqueued/memory': 1,
 'start_time': datetime.datetime(2015, 6, 20, 20, 35, 0, 968000)}
2015-06-20 22:35:02 [scrapy] INFO: Spider closed (finished)
2015-06-20 23:03:37 [scrapy] INFO: Scrapy 1.0.0 started (bot: craigslist_sample)
2015-06-20 23:03:37 [scrapy] INFO: Optional features available: ssl, http11
2015-06-20 23:03:37 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'craigslist_sample.spiders', 'SPIDER_MODULES': ['craigslist_sample.spiders'], 'LOG_STDOUT': True, 'LOG_FILE': '/tmp/scrapy_output.txt', 'BOT_NAME': 'craigslist_sample'}
2015-06-20 23:03:38 [scrapy] INFO: Enabled extensions: CloseSpider, TelnetConsole, LogStats, CoreStats, SpiderState
2015-06-20 23:03:38 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
2015-06-20 23:03:38 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2015-06-20 23:03:38 [scrapy] INFO: Enabled item pipelines: 
2015-06-20 23:03:38 [scrapy] INFO: Spider opened
2015-06-20 23:03:38 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2015-06-20 23:03:38 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
2015-06-20 23:03:39 [scrapy] DEBUG: Crawled (200) <GET http://sfbay.craigslist.org/search/sfc/npo> (referer: None)
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5083113578.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5083098605.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5083051162.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5083044559.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5083043239.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5083034151.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5082961277.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5082936118.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5082930994.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5082908649.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5082826886.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5082820427.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5082808607.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5082796023.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5082767892.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5082699233.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5082685178.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5082682792.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5082674781.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5082565558.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5082545852.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5082466564.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5082457151.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5082454103.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5082452290.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5082452087.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5082442715.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5082368243.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5082367400.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5082364446.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5082206212.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5082176091.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5082142295.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5081546128.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5081544083.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5081349969.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5081337282.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5081329478.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5081325271.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5081315033.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5081284397.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5081272495.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5081248716.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5081242306.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5081198308.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5081185072.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5081182362.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5081039111.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5081033894.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5081030919.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5080930010.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5080922969.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5080783300.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5080757424.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5080754908.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5080696793.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5080523544.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5080474373.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5079764803.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5079655298.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5079652979.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5079651750.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5079617063.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5079600458.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5079484883.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5079458099.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5079439949.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5079434763.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5079423265.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5079421733.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5079345334.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5079272799.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5079271027.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5079130762.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5079058773.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5078791191.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5078784316.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5078657036.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5078096040.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5078022877.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5078018145.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5077960434.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5077955778.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5077927644.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5077906229.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5077813126.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5077799125.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5077795848.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5077763673.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5077582518.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5077522272.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5077402309.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5077397915.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5077350438.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5077123591.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5076362090.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5076361296.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5076341213.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5076299050.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: [u'/sfc/npo/5076222757.html']
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [stdout] INFO: []
2015-06-20 23:03:39 [scrapy] INFO: Closing spider (finished)
2015-06-20 23:03:39 [scrapy] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 232,
 'downloader/request_count': 1,
 'downloader/request_method_count/GET': 1,
 'downloader/response_bytes': 15536,
 'downloader/response_count': 1,
 'downloader/response_status_count/200': 1,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2015, 6, 20, 21, 3, 39, 743000),
 'log_count/DEBUG': 2,
 'log_count/INFO': 209,
 'response_received_count': 1,
 'scheduler/dequeued': 1,
 'scheduler/dequeued/memory': 1,
 'scheduler/enqueued': 1,
 'scheduler/enqueued/memory': 1,
 'start_time': datetime.datetime(2015, 6, 20, 21, 3, 38, 303000)}
2015-06-20 23:03:39 [scrapy] INFO: Spider closed (finished)

感谢您的帮助

2 个答案:

答案 0 :(得分:2)

当您在每个<p>内搜索时,<a>标记中包含网址,但其中没有文字。检查网站的HTML。因此,您的<a>标记不正确。

我正在使用类<a>搜索class="hdrlnk"标记,并且它包含网址和文字。

for sel in response.xpath('//p//a[@class="hdrlnk"]'):
    print sel.xpath('text()').extract()
    print sel.xpath('@href').extract()

输出:

[u'Resident Services Coordinator']
[u'/sfc/npo/5083113578.html']
[u'Resident Services Coordinator']
[u'/sfc/npo/5083098605.html']
[u'General Manager - 939/951 Eddy']
[u'/sfc/npo/5083051162.html']
[u'General Manager - 430 Turk']
[u'/sfc/npo/5083044559.html']
....

<强>更新

  1. 我创建了相同的项目。 (scrapy startproject craigslist_sample

  2. 我从文件spiders/__init__.py

  3. 中删除了所有内容
  4. 更新了蜘蛛

  5. byub.py文件

    import scrapy
    from craigslist_sample.items import CraigslistSampleItem
    
    class MySpider(scrapy.Spider):
        name = "craig"
        allowed_domains = ["craigslist.org"]
        start_urls = [
            "http://sfbay.craigslist.org/search/sfc/npo"    ]
    
        def parse(self, response):
            items = []
            for sel in response.xpath('//p//a[@class="hdrlnk"]'):
               item = CraigslistSampleItem()
               item['title'] =  sel.xpath('text()').extract()
               item['link'] = sel.xpath('@href').extract()
               items.append(item)
            return items
    

    日志:

    2015-06-20 22:01:09 [scrapy] DEBUG: Scraped from <200 http://sfbay.craigslist.org/search/sfc/npo>
    {'link': [u'/sfc/npo/5083113578.html'],
     'title': [u'Resident Services Coordinator']}
    2015-06-20 22:01:09 [scrapy] DEBUG: Scraped from <200 http://sfbay.craigslist.org/search/sfc/npo>
    {'link': [u'/sfc/npo/5083098605.html'],
     'title': [u'Resident Services Coordinator']}
    2015-06-20 22:01:09 [scrapy] DEBUG: Scraped from <200 http://sfbay.craigslist.org/search/sfc/npo>
    {'link': [u'/sfc/npo/5083051162.html'],
     'title': [u'General Manager - 939/951 Eddy']}
    ...
    

    要运行抓取工具,我会从scrapy crawl --logfile logs craig所在的同一文件夹中运行scrapy.cfg

    运行具有不同日志级别scrapy crawl --logfile logs -L DEBUG craig

    的抓取工具

答案 1 :(得分:0)

您的蜘蛛return items中的最后一行缩进太远,需要: -

for sel in response.xpath('//p'):
    item = CraigslistSampleItem()
    # ...

return items

还可以尝试scrapy crawl craig -o results.json将项目输出到名为results.json的文件中。