没有使用scrapy将数据插入到数据库中

时间:2016-06-25 15:42:27

标签: python mysql scrapy

我使用Scrapy获取数据,我想将它们存储在MySQL数据库中。为了执行该操作,我复制并调整了大部分代码here

它看起来非常好,但没有办法让它在我这边工作。代码没有出现任何错误,数据被正确删除但我的数据库是空的。 这是我的管道代码:

class MySQLPipeline(object):

    def __init__(self):
        self.dbpool = dbpool

    @classmethod
    def from_settings(cls, settings):
        dbargs = dict(
            host=settings['MYSQL_HOST'],
            db=settings['MYSQL_DBNAME'],
            user=settings['MYSQL_USER'],
            passwd=settings['MYSQL_PASSWD'],
            charset='utf8',
            use_unicode=True,
        )
        dbpool = adbapi.ConnectionPool('MySQLdb', **dbargs)
        return cls(dbpool)

    def process_item(self, item, spider):
        d = self.dbpool.runInteraction(self._do_upsert, item, spider)
        d.addErrback(self._handle_error, item, spider)
        d.addBoth(lambda _: item)
        return d

    def _do_upsert(self, conn, item, spider):
        guid = self._get_guid(item)
        now = datetime.utcnow().replace(microsecond=0).isoformat(' ')

        conn.execute("""SELECT EXISTS(
            SELECT 1 FROM produits WHERE guid = %s
        )""", (guid, ))
        ret = conn.fetchone()

        if ret:
            conn.execute("""
                UPDATE produits
                SET product=%s, link=%s, price=%s, description=%s, image_urls=%s, image=%s, brand=%s, couleur=%s, gamme=%s, largeur=%s, profondeur=%s, hauteur=%s, longueur=%s, diametre=%s, updated=%s
                WHERE guid=%s
        """, (item['product'], item['link'], item['price'], item['description'], item['image_urls'], item['image'], item['brand'], item['couleur'], item['gamme'], item['largeur'], item['profondeur'], item['hauteur'], item['longueur'], item['diametre'], now, guid))
            spider.log("Item updated in db: %s %r" % (guid, item))
            self.dbpool.commit()
        else:
            conn.execute("""
                INSERT INTO produits (product, link, price, description, image_urls, image, brand, couleur, gamme, largeur, prodonfeur, hauteur, longueur, diametre, updated, guid)
                VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s )
        """, (item['product'], item['link'], item['price'], item['description'], item['image_urls'], item['image'], item['brand'], item['couleur'], item['gamme'], item['largeur'], item['profondeur'], item['hauteur'], item['longueur'], item['diametre'], now, guid))
            spider.log("Item stored in db: %s %r" % (guid, item))
            self.dbpool.commit()


    def _handle_error(self, failure, item, spider):
        log.err(failure)

    def _get_guid(self, item):
        return md5(item['link']).hexdigest()

以下是我的设置代码:

ITEM_PIPELINES = {
'Comparito.pipelines.ImagesPipeline':1,
'Comparito.pipelines.MySQLPipeline':300,
}

如果有人可以帮助我理解为什么它不会在我的数据库中插入任何内容,那就太棒了。

谢谢

0 个答案:

没有答案