Question

无法使用文件管道中的代码。帮我填写表格。 С找不到与Scrapy有关asyncpg的信息谢谢！

import asyncio
import asyncpg
import datetime
from .items import FilmItem


class FilmPipeline(object):

    async def __init__(self):
        self.connection = await asyncpg.connect('postgresql://postgres@localhost/movies')

    async def process_item(self, item, spider):
        await self.connection.execute('''INSERT INTO afisha(title)
                VALUES($1)''', item.get('title'))
        await self.connection.close()
        return item

    asyncio.get_event_loop().run_until_complete(process_item(self, item, spider))

Answer 1

并非完全是对原始问题的答案，更多的是替代方法。

Scrapy建立在Twisted之上，因此，如果使用asyncpg不是硬性要求，则可以使用psycopg2并从process_item方法返回Deferred，这样可以阻止编写操作是异步处理的。

import psycopg2
from twisted.internet import defer, reactor


class FilmPipeline:
    def __init__(self):
        self.connection = psycopg2.connect('postgresql://postgres@localhost/movies')

    def process_item(self, item, spider):
        dfd = defer.Deferred()
        dfd.addCallback(self.write_item)
        reactor.callLater(0, dfd.callback, item)
        return dfd

    def write_item(self, item):
        with self.connection.cursor() as cursor: 
            cursor.execute("INSERT INTO afisha(title) VALUES(%s)", item.get('title'))
        return item

    def close_spider(self, spider):
        self.connection.close()

来源：https://doc.scrapy.org/en/latest/topics/item-pipeline.html#take-screenshot-of-item

如何将抓取数据保存到数据库Postgresql（asyncpg）？

1 个答案: