Question

我正在使用SQLAlchemy的表达式语言作为基本工具编写数据库迁移工具。

我的源数据库可能是UTF8，也可能是SQL_ASCII。我的目标数据库将始终为UTF8。

我在SQLAlchemy 0.6.6中使用psycopg2驱动程序

我的常规迁移过程如下所示：

for t in target_tables:
    log.info("Migrating data from %s", t.fullname)
    source = self.source_md.tables[self.source_schema + "." + t.name]
    for row in source.select().execute():
        with sql_logging(logging.INFO):
            conn.execute(t.insert(), row)

如果我没有在引擎上设置任何与编码相关的内容，我会在迭代select()结果时得到这个：

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)

如果我在引擎上设置use_native_unicode=True, encoding='utf-8'，当我尝试插入新行时，我会得到这个：

sqlalchemy.exc.DataError: (DataError) invalid byte sequence for encoding "UTF8": 0xeb6d20
HINT:  This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".
 'INSERT INTO project_ghtests_survey000005.employees (first_name, employee_id) VALUES (%(first_name)s, %(employee_id)s)' {'first_name': 'Art\xebm', 'employee_id': '1234'}

更新详情

为了更快地查询，这里是正在使用的软件堆栈：

source_db编码：SQL_ASCII
target_db编码：UTF8
python 2.7
sqlalchemy 0.6.6
psycopg2 2.2.2
PostgreSQL 8.2服务器

Answer 1

事实证明，解决方案是将client_encoding连接设置为'latin1'

我使用像这样的PoolListener完成了这个：

class EncodingListener(PoolListener):

    def connect(self, dbapi_con, con_record):
        with closing(dbapi_con.cursor()) as cur:
            cur.execute('show client_encoding')
            encoding = cur.fetchone()[0]

        if encoding.upper() == 'UTF8':
            return

        dbapi_con.set_client_encoding('latin1')

Answer 2

由于UTF-8向后兼容UTF-8，为什么使用SQL_ASCII而不是UTF8？

我认为您的编码问题可能更符合latin1或类似的编码。不是ASCII到UTF8。

如何使用SQLAlchemy将数据从SQL_ASCII复制到UTF8 postgresql数据库？

更新详情

2 个答案: