如何将抓取数据保存到数据库Postgresql(asyncpg)?

时间:2018-08-12 18:19:33

标签: python python-3.x web-scraping scrapy asyncpg

无法使用文件管道中的代码。 帮我填写表格。 С找不到与Scrapy有关asyncpg的信息 谢谢!

import asyncio
import asyncpg
import datetime
from .items import FilmItem


class FilmPipeline(object):

    async def __init__(self):
        self.connection = await asyncpg.connect('postgresql://postgres@localhost/movies')

    async def process_item(self, item, spider):
        await self.connection.execute('''INSERT INTO afisha(title)
                VALUES($1)''', item.get('title'))
        await self.connection.close()
        return item

    asyncio.get_event_loop().run_until_complete(process_item(self, item, spider))

1 个答案:

答案 0 :(得分:0)

并非完全是对原始问题的答案,更多的是替代方法。

Scrapy建立在Twisted之上,因此,如果使用asyncpg不是硬性要求,则可以使用psycopg2并从process_item方法返回Deferred,这样可以阻止编写操作是异步处理的。

import psycopg2
from twisted.internet import defer, reactor


class FilmPipeline:
    def __init__(self):
        self.connection = psycopg2.connect('postgresql://postgres@localhost/movies')

    def process_item(self, item, spider):
        dfd = defer.Deferred()
        dfd.addCallback(self.write_item)
        reactor.callLater(0, dfd.callback, item)
        return dfd

    def write_item(self, item):
        with self.connection.cursor() as cursor: 
            cursor.execute("INSERT INTO afisha(title) VALUES(%s)", item.get('title'))
        return item

    def close_spider(self, spider):
        self.connection.close()

来源:https://doc.scrapy.org/en/latest/topics/item-pipeline.html#take-screenshot-of-item