Question

我正在尝试使用SQLAlchemy将我的Scrapy抓取的数据直接传递到Postgresql数据库中。我已经设法建立了连接，但是什么也没写，并且在写入数据库的每个项目上都出现了错误：

DETAIL: Key (hash_id)=(2122600700) already exists.
[SQL: 'UPDATE spider SET hash_id=%(hash_id)s'] [parameters: {'hash_id': 2122600700}]

这是不正确的，因为我检查了数据库（仅包含60个项目），并尝试使用不同的主键（hash_ids）来抓取项目。我一定在SQLAlchemy和Scrapy如何处理项目方面缺少一些东西，这些是我的管道：

pipeline.py

class PgPipeline(object):
    def __init__(self):
        """
        Initializes database connection.
        Reflects the spider table.
        """
        engine = db_connect()
        self.spiderDB = load_table(engine)
        self.conn = engine.connect()

    def process_item(self, item, spider):
        """Save listings in the database.

        This method is called for every item pipeline component.
        """
        stmt = self.spiderDB.update().values(item)
        self.conn.execute(stmt)
        return item

    def close_spider(self, spider):
        self.conn.close()

models.py

metadata = MetaData()


def db_connect():
    """
    Performs database connection using database settings from settings.py.
    Returns sqlalchemy engine instance
    """
    return create_engine(URL(**settings.DATABASE))


def load_table(engine):
    """
    Reflects the spider table in the DB
    """
    return Table('spider', metadata, autoload=True, autoload_with=engine)

真的希望你们中的一个能帮助我，因为我对此已经挠头了一段时间！

Answer 1

我使用update的方式不正确，因此使用insert（）就像是一种魅力。另外在文档中添加了建议的on_conflict解决了我的问题。工作了！

SQLAlchemy + Scrapy：主键已不正确存在错误

1 个答案: