我有一个记录,我希望它存在于数据库中,如果它不存在,并且它已经存在(主键存在)我希望字段更新到当前状态。这通常称为upsert。
以下不完整的代码段演示了什么可行,但它似乎过于笨重(特别是如果有更多的列)。什么是更好/最好的方式?
Base = declarative_base()
class Template(Base):
__tablename__ = 'templates'
id = Column(Integer, primary_key = True)
name = Column(String(80), unique = True, index = True)
template = Column(String(80), unique = True)
description = Column(String(200))
def __init__(self, Name, Template, Desc):
self.name = Name
self.template = Template
self.description = Desc
def UpsertDefaultTemplate():
sess = Session()
desired_default = Template("default", "AABBCC", "This is the default template")
try:
q = sess.query(Template).filter_by(name = desiredDefault.name)
existing_default = q.one()
except sqlalchemy.orm.exc.NoResultFound:
#default does not exist yet, so add it...
sess.add(desired_default)
else:
#default already exists. Make sure the values are what we want...
assert isinstance(existing_default, Template)
existing_default.name = desired_default.name
existing_default.template = desired_default.template
existing_default.description = desired_default.description
sess.flush()
这样做有更好或更简洁的方法吗?这样的事情会很棒:
sess.upsert_this(desired_default, unique_key = "name")
尽管unique_key
kwarg显然是不必要的(ORM应该能够很容易地解决这个问题)但我之所以添加它,只是因为SQLAlchemy倾向于只使用主键。例如:我一直在研究Session.merge是否适用,但这只适用于主键,在这种情况下,它是一个自动增量id,对于这个目的来说并不是非常有用。
此示例用例就是在启动可能已升级其默认预期数据的服务器应用程序时。即:此upsert没有并发问题。
答案 0 :(得分:43)
SQLAlchemy确实有“保存或更新”行为,在最新版本中已经内置session.add
,但之前是单独的session.saveorupdate
调用。这不是一个“upsert”,但它可能足以满足您的需求。
你问一个有多个唯一键的课是件好事;我相信这正是没有一种正确方法可以做到这一点的原因。主键也是唯一键。如果没有唯一约束,只有主键,那就足够简单:如果不存在给定ID,或者如果ID为None,则创建一个新记录;否则使用该主键更新现有记录中的所有其他字段。
但是,当存在其他唯一约束时,这种简单方法存在逻辑问题。如果要“upsert”一个对象,并且对象的主键与现有记录匹配,但另一个唯一列与不同的记录匹配,那么您要做什么?同样,如果主键不匹配现有记录,但另一个唯一列 匹配现有记录,那么什么?对于您的特定情况,可能有正确的答案,但总的来说,我认为没有一个正确的答案。
这就是没有内置“upsert”操作的原因。应用程序必须定义每种特定情况下的含义。
答案 1 :(得分:18)
SQLAlchemy现在支持ON CONFLICT
两种方法on_conflict_do_update()
和on_conflict_do_nothing()
:
从文档中复制:
from sqlalchemy.dialects.postgresql import insert
stmt = insert(my_table).values(user_email='a@b.com', data='inserted data')
stmt = stmt.on_conflict_do_update(
index_elements=[my_table.c.user_email],
index_where=my_table.c.user_email.like('%@gmail.com'),
set_=dict(data=stmt.excluded.data)
)
conn.execute(stmt)
答案 2 :(得分:8)
我在你跳跃之前先看看"的方法:
type_1_obj = Type.objects.get(pk=1)
type_1_total_names = type_obj.names_set.all().count()
type_1_sum_val = type_obj.names_set.all().aggregate(sum_val=Sum('value'))['sum_val']
优点是这是数据库中立的,我认为它很清楚。缺点是在如下情况中存在潜在的 竞争条件 :
# first get the object from the database if it exists
# we're guaranteed to only get one or zero results
# because we're filtering by primary key
switch_command = session.query(Switch_Command).\
filter(Switch_Command.switch_id == switch.id).\
filter(Switch_Command.command_id == command.id).first()
# If we didn't get anything, make one
if not switch_command:
switch_command = Switch_Command(switch_id=switch.id, command_id=command.id)
# update the stuff we care about
switch_command.output = 'Hooray!'
switch_command.lastseen = datetime.datetime.utcnow()
session.add(switch_command)
# This will generate either an INSERT or UPDATE
# depending on whether we have a new object or not
session.commit()
并且找不到一个switch_command
switch_command
switch_command
答案 3 :(得分:5)
如今,SQLAlchemy提供了两个有用的功能on_conflict_do_nothing
和on_conflict_do_update
。这些功能很有用,但需要您从ORM界面切换到较低级别的界面-SQLAlchemy Core。
尽管这两个功能使使用SQLAlchemy语法进行上调并不那么困难,但这些功能远不能为上调提供完整的现成解决方案。
我的常见用例是在单个SQL查询/会话执行中插入大量行。我通常会遇到两个问题:
例如,我们已经习惯了更高级别的ORM功能。您不能使用ORM对象,而必须在插入时提供ForeignKey
。
我正在使用this以下编写的函数来处理这两个问题:
def upsert(session, model, rows):
table = model.__table__
stmt = postgresql.insert(table)
primary_keys = [key.name for key in inspect(table).primary_key]
update_dict = {c.name: c for c in stmt.excluded if not c.primary_key}
if not update_dict:
raise ValueError("insert_or_update resulted in an empty update_dict")
stmt = stmt.on_conflict_do_update(index_elements=primary_keys,
set_=update_dict)
seen = set()
foreign_keys = {col.name: list(col.foreign_keys)[0].column for col in table.columns if col.foreign_keys}
unique_constraints = [c for c in table.constraints if isinstance(c, UniqueConstraint)]
def handle_foreignkeys_constraints(row):
for c_name, c_value in foreign_keys.items():
foreign_obj = row.pop(c_value.table.name, None)
row[c_name] = getattr(foreign_obj, c_value.name) if foreign_obj else None
for const in unique_constraints:
unique = tuple([const,] + [row[col.name] for col in const.columns])
if unique in seen:
return None
seen.add(unique)
return row
rows = list(filter(None, (handle_foreignkeys_constraints(row) for row in rows)))
session.execute(stmt, rows)
答案 4 :(得分:1)
这对我与sqlite3和postgres一起工作。尽管它可能会因结合了主键约束而失败,并且极有可能因附加的唯一约束而失败。
try:
t = self._meta.tables[data['table']]
except KeyError:
self._log.error('table "%s" unknown', data['table'])
return
try:
q = insert(t, values=data['values'])
self._log.debug(q)
self._db.execute(q)
except IntegrityError:
self._log.warning('integrity error')
where_clause = [c.__eq__(data['values'][c.name]) for c in t.c if c.primary_key]
update_dict = {c.name: data['values'][c.name] for c in t.c if not c.primary_key}
q = update(t, values=update_dict).where(*where_clause)
self._log.debug(q)
self._db.execute(q)
except Exception as e:
self._log.error('%s: %s', t.name, e)
答案 5 :(得分:1)
以下内容对我来说适用于redshift数据库,也适用于组合主键约束。
来源:this
在函数中创建SQLAlchemy引擎所需的修改很少 def start_engine()
from sqlalchemy import Column, Integer, Date ,Metadata
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.dialects.postgresql import insert
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from sqlalchemy.dialects import postgresql
Base = declarative_base()
def start_engine():
engine = create_engine(os.getenv('SQLALCHEMY_URI',
'postgresql://localhost:5432/upsert'))
connect = engine.connect()
meta = MetaData(bind=engine)
meta.reflect(bind=engine)
return engine
class DigitalSpend(Base):
__tablename__ = 'digital_spend'
report_date = Column(Date, nullable=False)
day = Column(Date, nullable=False, primary_key=True)
impressions = Column(Integer)
conversions = Column(Integer)
def __repr__(self):
return str([getattr(self, c.name, None) for c in self.__table__.c])
def compile_query(query):
compiler = query.compile if not hasattr(query, 'statement') else
query.statement.compile
return compiler(dialect=postgresql.dialect())
def upsert(session, model, rows, as_of_date_col='report_date', no_update_cols=[]):
table = model.__table__
stmt = insert(table).values(rows)
update_cols = [c.name for c in table.c
if c not in list(table.primary_key.columns)
and c.name not in no_update_cols]
on_conflict_stmt = stmt.on_conflict_do_update(
index_elements=table.primary_key.columns,
set_={k: getattr(stmt.excluded, k) for k in update_cols},
index_where=(getattr(model, as_of_date_col) < getattr(stmt.excluded, as_of_date_col))
)
print(compile_query(on_conflict_stmt))
session.execute(on_conflict_stmt)
session = start_engine()
upsert(session, DigitalSpend, initial_rows, no_update_cols=['conversions'])
答案 6 :(得分:0)
这允许基于字符串名称访问基础模型
def get_class_by_tablename(tablename):
"""Return class reference mapped to table.
https://stackoverflow.com/questions/11668355/sqlalchemy-get-model-from-table-name-this-may-imply-appending-some-function-to
:param tablename: String with name of table.
:return: Class reference or None.
"""
for c in Base._decl_class_registry.values():
if hasattr(c, '__tablename__') and c.__tablename__ == tablename:
return c
sqla_tbl = get_class_by_tablename(table_name)
def handle_upsert(record_dict, table):
"""
handles updates when there are primary key conflicts
"""
try:
self.active_session().add(table(**record_dict))
except:
# Here we'll assume the error is caused by an integrity error
# We do this because the error classes are passed from the
# underlying package (pyodbc / sqllite) SQLAlchemy doesn't mask
# them with it's own code - this should be updated to have
# explicit error handling for each new db engine
# <update>add explicit error handling for each db engine</update>
active_session.rollback()
# Query for conflic class, use update method to change values based on dict
c_tbl_primary_keys = [i.name for i in table.__table__.primary_key] # List of primary key col names
c_tbl_cols = dict(sqla_tbl.__table__.columns) # String:Col Object crosswalk
c_query_dict = {k:record_dict[k] for k in c_tbl_primary_keys if k in record_dict} # sub-dict from data of primary key:values
c_oo_query_dict = {c_tbl_cols[k]:v for (k,v) in c_query_dict.items()} # col-object:query value for primary key cols
c_target_record = session.query(sqla_tbl).filter(*[k==v for (k,v) in oo_query_dict.items()]).first()
# apply new data values to the existing record
for k, v in record_dict.items()
setattr(c_target_record, k, v)
答案 7 :(得分:0)
有多个答案,这里又出现了另一个答案 (YAA)。由于涉及元编程,其他答案不那么可读。这是一个例子
使用 SQLAlchemy ORM
显示如何使用 on_conflict_do_nothing
显示如何使用 on_conflict_do_update
使用表主键作为 constraint
the original question what this code is related to 中更长的示例。
import sqlalchemy as sa
import sqlalchemy.orm as orm
from sqlalchemy import text
from sqlalchemy.dialects.postgresql import insert
from sqlalchemy.orm import Session
class PairState(Base):
__tablename__ = "pair_state"
# This table has 1-to-1 relationship with Pair
pair_id = sa.Column(sa.ForeignKey("pair.id"), nullable=False, primary_key=True, unique=True)
pair = orm.relationship(Pair,
backref=orm.backref("pair_state",
lazy="dynamic",
cascade="all, delete-orphan",
single_parent=True, ), )
# First raw event in data stream
first_event_at = sa.Column(sa.TIMESTAMP(timezone=True), nullable=False, server_default=text("TO_TIMESTAMP(0)"))
# Last raw event in data stream
last_event_at = sa.Column(sa.TIMESTAMP(timezone=True), nullable=False, server_default=text("TO_TIMESTAMP(0)"))
# The last hypertable entry added
last_interval_at = sa.Column(sa.TIMESTAMP(timezone=True), nullable=False, server_default=text("TO_TIMESTAMP(0)"))
@staticmethod
def create_first_event_if_not_exist(dbsession: Session, pair_id: int, ts: datetime.datetime):
"""Sets the first event value if not exist yet."""
dbsession.execute(
insert(PairState).
values(pair_id=pair_id, first_event_at=ts).
on_conflict_do_nothing()
)
@staticmethod
def update_last_event(dbsession: Session, pair_id: int, ts: datetime.datetime):
"""Replaces the the column last_event_at for a named pair."""
# Based on the original example of https://stackoverflow.com/a/49917004/315168
dbsession.execute(
insert(PairState).
values(pair_id=pair_id, last_event_at=ts).
on_conflict_do_update(constraint=PairState.__table__.primary_key, set_={"last_event_at": ts})
)
@staticmethod
def update_last_interval(dbsession: Session, pair_id: int, ts: datetime.datetime):
"""Replaces the the column last_interval_at for a named pair."""
dbsession.execute(
insert(PairState).
values(pair_id=pair_id, last_interval_at=ts).
on_conflict_do_update(constraint=PairState.__table__.primary_key, set_={"last_interval_at": ts})
)