如何使用SqlAlchemy进行upsert?

时间:2011-08-23 18:46:34

标签: python sqlalchemy upsert

我有一个记录,我希望它存在于数据库中,如果它不存在,并且它已经存在(主键存在)我希望字段更新到当前状态。这通常称为upsert

以下不完整的代码段演示了什么可行,但它似乎过于笨重(特别是如果有更多的列)。什么是更好/最好的方式?

Base = declarative_base()
class Template(Base):
    __tablename__ = 'templates'
    id = Column(Integer, primary_key = True)
    name = Column(String(80), unique = True, index = True)
    template = Column(String(80), unique = True)
    description = Column(String(200))
    def __init__(self, Name, Template, Desc):
        self.name = Name
        self.template = Template
        self.description = Desc

def UpsertDefaultTemplate():
    sess = Session()
    desired_default = Template("default", "AABBCC", "This is the default template")
    try:
        q = sess.query(Template).filter_by(name = desiredDefault.name)
        existing_default = q.one()
    except sqlalchemy.orm.exc.NoResultFound:
        #default does not exist yet, so add it...
        sess.add(desired_default)
    else:
        #default already exists.  Make sure the values are what we want...
        assert isinstance(existing_default, Template)
        existing_default.name = desired_default.name
        existing_default.template = desired_default.template
        existing_default.description = desired_default.description
    sess.flush()

这样做有更好或更简洁的方法吗?这样的事情会很棒:

sess.upsert_this(desired_default, unique_key = "name")

尽管unique_key kwarg显然是不必要的(ORM应该能够很容易地解决这个问题)但我之所以添加它,只是因为SQLAlchemy倾向于只使用主键。例如:我一直在研究Session.merge是否适用,但这只适用于主键,在这种情况下,它是一个自动增量id,对于这个目的来说并不是非常有用。

此示例用例就是在启动可能已升级其默认预期数据的服务器应用程序时。即:此upsert没有并发问题。

8 个答案:

答案 0 :(得分:43)

SQLAlchemy确实有“保存或更新”行为,在最新版本中已经内置session.add,但之前是单独的session.saveorupdate调用。这不是一个“upsert”,但它可能足以满足您的需求。

你问一个有多个唯一键的课是件好事;我相信这正是没有一种正确方法可以做到这一点的原因。主键也是唯一键。如果没有唯一约束,只有主键,那就足够简单:如果不存在给定ID,或者如果ID为None,则创建一个新记录;否则使用该主键更新现有记录中的所有其他字段。

但是,当存在其他唯一约束时,这种简单方法存在逻辑问题。如果要“upsert”一个对象,并且对象的主键与现有记录匹配,但另一个唯一列与不同的记录匹配,那么您要做什么?同样,如果主键不匹配现有记录,但另一个唯一列 匹配现有记录,那么什么?对于您的特定情况,可能有正确的答案,但总的来说,我认为没有一个正确的答案。

这就是没有内置“upsert”操作的原因。应用程序必须定义每种特定情况下的含义。

答案 1 :(得分:18)

SQLAlchemy现在支持ON CONFLICT两种方法on_conflict_do_update()on_conflict_do_nothing()

从文档中复制:

from sqlalchemy.dialects.postgresql import insert

stmt = insert(my_table).values(user_email='a@b.com', data='inserted data')
stmt = stmt.on_conflict_do_update(
    index_elements=[my_table.c.user_email],
    index_where=my_table.c.user_email.like('%@gmail.com'),
    set_=dict(data=stmt.excluded.data)
    )
conn.execute(stmt)

http://docs.sqlalchemy.org/en/latest/dialects/postgresql.html?highlight=conflict#insert-on-conflict-upsert

答案 2 :(得分:8)

我在你跳跃之前先看看"的方法:

type_1_obj = Type.objects.get(pk=1)
type_1_total_names = type_obj.names_set.all().count()
type_1_sum_val = type_obj.names_set.all().aggregate(sum_val=Sum('value'))['sum_val']

优点是这是数据库中立的,我认为它很清楚。缺点是在如下情况中存在潜在的 竞争条件

  • 我们在数据库中查询# first get the object from the database if it exists # we're guaranteed to only get one or zero results # because we're filtering by primary key switch_command = session.query(Switch_Command).\ filter(Switch_Command.switch_id == switch.id).\ filter(Switch_Command.command_id == command.id).first() # If we didn't get anything, make one if not switch_command: switch_command = Switch_Command(switch_id=switch.id, command_id=command.id) # update the stuff we care about switch_command.output = 'Hooray!' switch_command.lastseen = datetime.datetime.utcnow() session.add(switch_command) # This will generate either an INSERT or UPDATE # depending on whether we have a new object or not session.commit() 并且找不到一个
  • 我们创建了一个switch_command
  • 另一个进程或线程使用与我们相同的主键创建switch_command
  • 我们尝试提交switch_command

答案 3 :(得分:5)

如今,SQLAlchemy提供了两个有用的功能on_conflict_do_nothingon_conflict_do_update。这些功能很有用,但需要您从ORM界面切换到较低级别的界面-SQLAlchemy Core

尽管这两个功能使使用SQLAlchemy语法进行上调并不那么困难,但这些功能远不能为上调提供完整的现成解决方案。

我的常见用例是在单个SQL查询/会话执行中插入大量行。我通常会遇到两个问题:

例如,我们已经习惯了更高级别的ORM功能。您不能使用ORM对象,而必须在插入时提供ForeignKey

我正在使用this以下编写的函数来处理这两个问题:

def upsert(session, model, rows):
    table = model.__table__
    stmt = postgresql.insert(table)
    primary_keys = [key.name for key in inspect(table).primary_key]
    update_dict = {c.name: c for c in stmt.excluded if not c.primary_key}

    if not update_dict:
        raise ValueError("insert_or_update resulted in an empty update_dict")

    stmt = stmt.on_conflict_do_update(index_elements=primary_keys,
                                      set_=update_dict)

    seen = set()
    foreign_keys = {col.name: list(col.foreign_keys)[0].column for col in table.columns if col.foreign_keys}
    unique_constraints = [c for c in table.constraints if isinstance(c, UniqueConstraint)]
    def handle_foreignkeys_constraints(row):
        for c_name, c_value in foreign_keys.items():
            foreign_obj = row.pop(c_value.table.name, None)
            row[c_name] = getattr(foreign_obj, c_value.name) if foreign_obj else None

        for const in unique_constraints:
            unique = tuple([const,] + [row[col.name] for col in const.columns])
            if unique in seen:
                return None
            seen.add(unique)

        return row

    rows = list(filter(None, (handle_foreignkeys_constraints(row) for row in rows)))
    session.execute(stmt, rows)

答案 4 :(得分:1)

这对我与sqlite3和postgres一起工作。尽管它可能会因结合了主键约束而失败,并且极有可能因附加的唯一约束而失败。

    try:
        t = self._meta.tables[data['table']]
    except KeyError:
        self._log.error('table "%s" unknown', data['table'])
        return

    try:
        q = insert(t, values=data['values'])
        self._log.debug(q)
        self._db.execute(q)
    except IntegrityError:
        self._log.warning('integrity error')
        where_clause = [c.__eq__(data['values'][c.name]) for c in t.c if c.primary_key]
        update_dict = {c.name: data['values'][c.name] for c in t.c if not c.primary_key}
        q = update(t, values=update_dict).where(*where_clause)
        self._log.debug(q)
        self._db.execute(q)
    except Exception as e:
        self._log.error('%s: %s', t.name, e)

答案 5 :(得分:1)

以下内容对我来说适用于redshift数据库,也适用于组合主键约束。

来源this

在函数中创建SQLAlchemy引擎所需的修改很少 def start_engine()

from sqlalchemy import Column, Integer, Date ,Metadata
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.dialects.postgresql import insert
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from sqlalchemy.dialects import postgresql

Base = declarative_base()

def start_engine():
    engine = create_engine(os.getenv('SQLALCHEMY_URI', 
    'postgresql://localhost:5432/upsert'))
     connect = engine.connect()
    meta = MetaData(bind=engine)
    meta.reflect(bind=engine)
    return engine


class DigitalSpend(Base):
    __tablename__ = 'digital_spend'
    report_date = Column(Date, nullable=False)
    day = Column(Date, nullable=False, primary_key=True)
    impressions = Column(Integer)
    conversions = Column(Integer)

    def __repr__(self):
        return str([getattr(self, c.name, None) for c in self.__table__.c])


def compile_query(query):
    compiler = query.compile if not hasattr(query, 'statement') else 
  query.statement.compile
    return compiler(dialect=postgresql.dialect())


def upsert(session, model, rows, as_of_date_col='report_date', no_update_cols=[]):
    table = model.__table__

    stmt = insert(table).values(rows)

    update_cols = [c.name for c in table.c
                   if c not in list(table.primary_key.columns)
                   and c.name not in no_update_cols]

    on_conflict_stmt = stmt.on_conflict_do_update(
        index_elements=table.primary_key.columns,
        set_={k: getattr(stmt.excluded, k) for k in update_cols},
        index_where=(getattr(model, as_of_date_col) < getattr(stmt.excluded, as_of_date_col))
        )

    print(compile_query(on_conflict_stmt))
    session.execute(on_conflict_stmt)


session = start_engine()
upsert(session, DigitalSpend, initial_rows, no_update_cols=['conversions'])

答案 6 :(得分:0)

这允许基于字符串名称访问基础模型

def get_class_by_tablename(tablename):
  """Return class reference mapped to table.
  https://stackoverflow.com/questions/11668355/sqlalchemy-get-model-from-table-name-this-may-imply-appending-some-function-to
  :param tablename: String with name of table.
  :return: Class reference or None.
  """
  for c in Base._decl_class_registry.values():
    if hasattr(c, '__tablename__') and c.__tablename__ == tablename:
      return c


sqla_tbl = get_class_by_tablename(table_name)

def handle_upsert(record_dict, table):
    """
    handles updates when there are primary key conflicts

    """
    try:
        self.active_session().add(table(**record_dict))
    except:
        # Here we'll assume the error is caused by an integrity error
        # We do this because the error classes are passed from the
        # underlying package (pyodbc / sqllite) SQLAlchemy doesn't mask
        # them with it's own code - this should be updated to have
        # explicit error handling for each new db engine

        # <update>add explicit error handling for each db engine</update> 
        active_session.rollback()
        # Query for conflic class, use update method to change values based on dict
        c_tbl_primary_keys = [i.name for i in table.__table__.primary_key] # List of primary key col names
        c_tbl_cols = dict(sqla_tbl.__table__.columns) # String:Col Object crosswalk

        c_query_dict = {k:record_dict[k] for k in c_tbl_primary_keys if k in record_dict} # sub-dict from data of primary key:values
        c_oo_query_dict = {c_tbl_cols[k]:v for (k,v) in c_query_dict.items()} # col-object:query value for primary key cols

        c_target_record = session.query(sqla_tbl).filter(*[k==v for (k,v) in oo_query_dict.items()]).first()

        # apply new data values to the existing record
        for k, v in record_dict.items()
            setattr(c_target_record, k, v)

答案 7 :(得分:0)

有多个答案,这里又出现了另一个答案 (YAA)。由于涉及元编程,其他答案不那么可读。这是一个例子

  • 使用 SQLAlchemy ORM

  • 显示如何使用 on_conflict_do_nothing

    在零行时创建一行
  • 显示如何使用 on_conflict_do_update

    在不创建新行的情况下更新现有行(如果有)
  • 使用表主键作为 constraint

the original question what this code is related to 中更长的示例。


import sqlalchemy as sa
import sqlalchemy.orm as orm
from sqlalchemy import text
from sqlalchemy.dialects.postgresql import insert
from sqlalchemy.orm import Session

class PairState(Base):

    __tablename__ = "pair_state"

    # This table has 1-to-1 relationship with Pair
    pair_id = sa.Column(sa.ForeignKey("pair.id"), nullable=False, primary_key=True, unique=True)
    pair = orm.relationship(Pair,
                        backref=orm.backref("pair_state",
                                        lazy="dynamic",
                                        cascade="all, delete-orphan",
                                        single_parent=True, ), )


    # First raw event in data stream
    first_event_at = sa.Column(sa.TIMESTAMP(timezone=True), nullable=False, server_default=text("TO_TIMESTAMP(0)"))

    # Last raw event in data stream
    last_event_at = sa.Column(sa.TIMESTAMP(timezone=True), nullable=False, server_default=text("TO_TIMESTAMP(0)"))

    # The last hypertable entry added
    last_interval_at = sa.Column(sa.TIMESTAMP(timezone=True), nullable=False, server_default=text("TO_TIMESTAMP(0)"))

    @staticmethod
    def create_first_event_if_not_exist(dbsession: Session, pair_id: int, ts: datetime.datetime):
        """Sets the first event value if not exist yet."""
        dbsession.execute(
            insert(PairState).
            values(pair_id=pair_id, first_event_at=ts).
            on_conflict_do_nothing()
        )

    @staticmethod
    def update_last_event(dbsession: Session, pair_id: int, ts: datetime.datetime):
        """Replaces the the column last_event_at for a named pair."""
        # Based on the original example of https://stackoverflow.com/a/49917004/315168
        dbsession.execute(
            insert(PairState).
            values(pair_id=pair_id, last_event_at=ts).
            on_conflict_do_update(constraint=PairState.__table__.primary_key, set_={"last_event_at": ts})
        )

    @staticmethod
    def update_last_interval(dbsession: Session, pair_id: int, ts: datetime.datetime):
        """Replaces the the column last_interval_at for a named pair."""
        dbsession.execute(
            insert(PairState).
            values(pair_id=pair_id, last_interval_at=ts).
            on_conflict_do_update(constraint=PairState.__table__.primary_key, set_={"last_interval_at": ts})
        )