我有一个预先存在的sqllite表,我正在访问sqlalchemy。我意识到有一些重复的案例'数字存在。如果我理解正确,使用sqllite创建表后,您似乎无法为表添加唯一约束,在使用以下内容删除重复后:
DELETE FROM mytable
WHERE id NOT IN
(
SELECT MIN(id)
FROM judgements
GROUP BY "case"
我决定使用sqlalchemy来防止添加额外的重复项。我正在使用scrapy并且有一个看起来像这样的管道元素:
class DynamicSQLlitePipeline(object):
def __init__(self,table_name):
db_path = "sqlite:///"+settings.SETTINGS_PATH+"\\data.db"
_engine = create_engine(db_path)
_connection = _engine.connect()
_metadata = MetaData()
_stack_items = Table(table_name, _metadata,
Column("id", Integer, primary_key=True),
Column("case", Text , unique=True),
....)
_metadata.create_all(_engine)
self.connection = _connection
self.stack_items = _stack_items
def process_item(self, item, spider):
try:
ins_query = self.stack_items.insert().values(
case=item['case'],
....
)
self.connection.execute(ins_query)
except IntegrityError:
print('THIS IS A DUP')
return item
我所做的唯一改变是在列' case'中添加unique = True。 。但是在测试时,仍然会添加重复/如何使其正常工作?
答案 0 :(得分:1)
下面的代码片段在我的一边使用python版本2.7和sqlalchemy版本1.0.9和sqlite版本3.15.2。
from sqlalchemy import create_engine, MetaData, Column, Integer, Table, Text
from sqlalchemy.exc import IntegrityError
class DynamicSQLlitePipeline(object):
def __init__(self, table_name):
db_path = "sqlite:///data.db"
_engine = create_engine(db_path)
_connection = _engine.connect()
_metadata = MetaData()
_stack_items = Table(table_name, _metadata,
Column("id", Integer, primary_key=True),
Column("case", Text, unique=True),)
_metadata.create_all(_engine)
self.connection = _connection
self.stack_items = _stack_items
def process_item(self, item):
try:
ins_query = self.stack_items.insert().values(case=item['case'])
self.connection.execute(ins_query)
except IntegrityError:
print('THIS IS A DUP')
return item
if __name__ == '__main__':
d = DynamicSQLlitePipeline("pipeline")
item = {
'case': 'sdjwaichjkneirjpewjcmelkdfpoewrjlkxncdsd'
}
print d.process_item(item)
第二次运行的输出如下:
THIS IS A DUP
{'case': 'sdjwaichjkneirjpewjcmelkdfpoewrjlkxncdsd'}
我的代码逻辑没有太大区别。唯一的区别可能是我猜的版本。