Scrapy履带不稳定,有时候工作不会

时间:2017-02-06 06:00:30

标签: python mongodb scrapy web-crawler pipeline

我的抓取工具有时会工作 - 爬行和刮擦但有时只是爬行并且不会刮掉任何东西而不改变代码上的任何东西:/我不明白。没有错误代码或任何东西。当它不会刮擦时看起来像这样;

2017-02-05 23:52:00 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.amazon.com/s/srs=9187220011&rh=n%3A283155> (referer: None)
2017-02-05 23:52:00 [scrapy.core.engine] INFO: Closing spider (finished)
2017-02-05 23:52:00 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 712,
 'downloader/request_count': 2,
 'downloader/request_method_count/GET': 2,
 'downloader/response_bytes': 3964,
 'downloader/response_count': 2,
 'downloader/response_status_count/200': 2,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2017, 2, 6, 5, 52, 0, 552000),
 'log_count/DEBUG': 7,
 'log_count/INFO': 7,
 'log_count/WARNING': 1,
 'response_received_count': 2,
 'scheduler/dequeued': 1,
 'scheduler/dequeued/memory': 1,
 'scheduler/enqueued': 1,
 'scheduler/enqueued/memory': 1,
 'start_time': datetime.datetime(2017, 2, 6, 5, 51, 59, 328000)}

我正在尝试使用mongodb管道抓取这个网站并将其放入mongodb。就像它实际上工作,但有时它不会工作,这是非常奇怪的。 我在想这可能是管道问题,但不确定..有什么建议吗?我怎么能检查出错了什么。我已经连接到mongodb,就像我正在做这个mongod正在运行

这是我的mongodbpipeline;

class MongoDBPipeline(object):

    def __init__(self):
        connection = pymongo.MongoClient(
            settings['MONGODB_SERVER'],
            settings['MONGODB_PORT']
        )
        db = connection[settings['MONGODB_DB']]
        self.collection = db[settings['MONGODB_COLLECTION']]

    def process_item(self, item, spider):
        valid = True
        for data in item:
            if not data:
                valid = False
                raise DropItem("Missing {0}!".format(data))
        if valid:
            self.collection.insert(dict(item))
            log.msg("link added to MongoDB database!",
                    level=log.DEBUG, spider=spider)
        return item

0 个答案:

没有答案