在预先存在的sqllite表中防止使用sqlalchemy的重复条目

时间:2016-12-14 14:48:58

标签: python sqlite sqlalchemy scrapy

我有一个预先存在的sqllite表,我正在访问sqlalchemy。我意识到有一些重复的案例'数字存在。如果我理解正确,使用sqllite创建表后,您似乎无法为表添加唯一约束,在使用以下内容删除重复后:

DELETE FROM mytable
WHERE id NOT IN
(
SELECT MIN(id)
FROM judgements
GROUP BY "case"

我决定使用sqlalchemy来防止添加额外的重复项。我正在使用scrapy并且有一个看起来像这样的管道元素:

class DynamicSQLlitePipeline(object):

    def __init__(self,table_name):
        db_path = "sqlite:///"+settings.SETTINGS_PATH+"\\data.db"
        _engine = create_engine(db_path)
        _connection = _engine.connect()
        _metadata = MetaData()
        _stack_items = Table(table_name, _metadata,
                             Column("id", Integer, primary_key=True),
                             Column("case", Text , unique=True),
                              ....)
        _metadata.create_all(_engine)
        self.connection = _connection
        self.stack_items = _stack_items



    def process_item(self, item, spider):

        try:
            ins_query = self.stack_items.insert().values(
            case=item['case'],
            ....
            )
            self.connection.execute(ins_query)
        except IntegrityError:
                print('THIS IS A DUP')
        return item

我所做的唯一改变是在列' case'中添加unique = True。 。但是在测试时,仍然会添加重复/如何使其正常工作?

1 个答案:

答案 0 :(得分:1)

下面的代码片段在我的一边使用python版本2.7和sqlalchemy版本1.0.9和sqlite版本3.15.2。

from sqlalchemy import create_engine, MetaData, Column, Integer, Table, Text
from sqlalchemy.exc import IntegrityError


class DynamicSQLlitePipeline(object):

    def __init__(self, table_name):
        db_path = "sqlite:///data.db"
        _engine = create_engine(db_path)
        _connection = _engine.connect()
        _metadata = MetaData()
        _stack_items = Table(table_name, _metadata,
                             Column("id", Integer, primary_key=True),
                             Column("case", Text, unique=True),)
        _metadata.create_all(_engine)
        self.connection = _connection
        self.stack_items = _stack_items

    def process_item(self, item):

        try:
            ins_query = self.stack_items.insert().values(case=item['case'])
            self.connection.execute(ins_query)
        except IntegrityError:
                print('THIS IS A DUP')
        return item

if __name__ == '__main__':

    d = DynamicSQLlitePipeline("pipeline")
    item = {
        'case': 'sdjwaichjkneirjpewjcmelkdfpoewrjlkxncdsd'
    }
    print d.process_item(item)

第二次运行的输出如下:

THIS IS A DUP
{'case': 'sdjwaichjkneirjpewjcmelkdfpoewrjlkxncdsd'}

我的代码逻辑没有太大区别。唯一的区别可能是我猜的版本。