scrapy crawler出错

时间:2013-01-20 15:59:30

标签: python scrapy

这是错误消息:

2013-01-20 22:45:02+0700 [scrapy] INFO: Scrapy 0.16.3 started (bot: scrapybot)

2013-01-20 22:45:02+0700 [scrapy] DEBUG: Enabled extensions: LogStats, TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState

2013-01-20 22:45:02+0700 [scrapy] DEBUG: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, RedirectMiddleware, CookiesMiddleware, HttpCompressionMiddleware, ChunkedTransferMiddleware, DownloaderStats

2013-01-20 22:45:02+0700 [scrapy] DEBUG: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware

2013-01-20 22:45:02+0700 [scrapy] DEBUG: Enabled item pipelines: 

2013-01-20 22:45:02+0700 [test] INFO: Spider opened

2013-01-20 22:45:02+0700 [test] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)

2013-01-20 22:45:02+0700 [scrapy] DEBUG: Telnet console listening on 0.0.0.0:6023

2013-01-20 22:45:02+0700 [scrapy] DEBUG: Web service listening on 0.0.0.0:6080

2013-01-20 22:45:07+0700 [test] DEBUG: Crawled (200) <GET https://api.instagram.com/v1/tags/finnishgirl/media/recent?client_id=b59fbe4563944b6c88cced13495c0f49&callback=jQuery15208793520946055651_1358691536717&_=1358691537498> (referer: None)

2013-01-20 22:45:07+0700 [scrapy] INFO: 18

2013-01-20 22:45:07+0700 [scrapy] INFO: 18

2013-01-20 22:45:21+0700 [test] DEBUG: Crawled (200) <GET https://api.instagram.com/v1/tags/finnishgirl/media/recent?callback=jQuery15208793520946055651_1358691536717&_=1358691537498&client_id=b59fbe4563944b6c88cced13495c0f49&max_tag_id=1358724742769> (referer: https://api.instagram.com/v1/tags/finnishgirl/media/recent?client_id=b59fbe4563944b6c88cced13495c0f49&callback=jQuery15208793520946055651_1358691536717&_=1358691537498)

2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [scrapy] INFO: 18
2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] ERROR: 2013-01-20 22:45:21+0700 [-] E......

我不知道有关此类错误的任何信息

这是我的代码。

from scrapy.contrib.spiders import CrawlSpider
from scrapy.http import Request
from scrapy import log
import json
import re

class Spider(CrawlSpider):
    name = "test"
    count = 0

    def start_requests(self):
        return [Request('https://api.instagram.com/v1/tags/finnishgirl/media/recent?client_id=b59fbe4563944b6c88cced13495c0f49&callback=jQuery15208793520946055651_1358691536717&_=1358691537498', callback=self.parse_basic)]

    def parse_basic(self, response):
        if self.count == 2:
            return
        self.count = self.count + 1
        log.start()
        body = response.body
        body = re.sub (r'jQuery[0-9_]+\(', '', body)
        body = body[:len(body) - 1]
        body = json.loads(body)
        next_url = body['pagination']['next_url']
        count = len(body['data'])
        log.msg(str(count), level=log.INFO)
        f = open('test.'+str(self.count), 'w')
        f.write(next_url)
        f.close
        return [Request(next_url, callback=self.parse_basic)]

1 个答案:

答案 0 :(得分:0)

我发现了我的错误 这是因为parse_basic方法中的log.start(),返回语句中yield的请求被发送回parse方法,然后log.start()再次启动=&gt;这会使错误发生