如何使用ascii字符查询unicode数据库

时间:2017-03-11 18:54:37

标签: python postgresql unicode

我目前正在我的postgresql数据库上运行一个忽略德语字符的查询 - 变音符号。但是,我不想丢失这些字符,而是希望在查询的输出中包含德语字符或至少它们的等价物(例如ä= ae)。运行Python 2.7.12

当我将编码对象更改为replacexmlcharrefreplace时,我收到以下错误:

psycopg2.ProgrammingError: syntax error at or near "?"
LINE 1: ?SELECT

代码段:

# -*- coding: utf-8 -*-

    connection_str = r'postgresql://' + user + ':' + password + '@' + host + '/' + database

    def query_db(conn, sql):
        with conn.cursor() as curs:
            curs.execute(sql)
            rows = curs.fetchall()

        print("fetched %s rows from db" % len(rows))

        return rows

    with psycopg2.connect(connection_str) as conn:
        for filename in files:
            # Read SQL
            sql = u""

            f = codecs.open(os.path.join(SQL_LOC, filename), "r", "utf-8")

            for line in f:
                sql += line.encode('ascii', 'replace').replace('\r\n', ' ')

            rows = query_db(conn, f)

如何将查询作为带有德语字符的unicode对象传递? 我也尝试将查询解码为utf-8,但后来我收到以下错误:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 20: ordinal not in range(128)

2 个答案:

答案 0 :(得分:0)

以下是获取其编码等效项的解决方案。您将能够在以后重新编码它,查询不会产生错误:

SELECT convert_from(BYTEA 'foo ᚠ bar'::bytea, 'latin-1');
+----------------+
| convert_from   |
|----------------|
| foo á<U+009A>  bar                |
+----------------+
SELECT 1
Time: 0.011s

答案 1 :(得分:0)

你只需要conn.set_client_encoding("utf-8")然后你就可以执行unicode字符串 - sql和结果将被动态编码和解码:

$ cat psycopg2-unicode.py
import sys
import os
import psycopg2
import csv

with psycopg2.connect("") as conn:
    conn.set_client_encoding("utf-8")
    for filename in sys.argv[1:]:
        file = open(filename, "r", encoding="utf-8")
        sql = file.read()
        with conn.cursor() as cursor:
            cursor.execute(sql)
            try:
                rows = cursor.fetchall()
            except psycopg2.ProgrammingError as err:
                # No results
                continue
            with open(filename+".out", "w", encoding="utf-8", newline="") as outfile:
                csv.writer(outfile, dialect="excel-tab").writerows(rows)

$ cat sql0.sql
create temporary table t(v) as
    select 'The quick brown fox jumps over the lazy dog.'
    union all
    select 'Zwölf große Boxkämpfer jagen Viktor quer über den Sylter Deich.'
    union all
    select 'Любя, съешь щипцы, — вздохнёт мэр, — кайф жгуч.'
    union all
    select 'Mężny bądź, chroń pułk twój i sześć flag.'
;

$ cat sql1.sql
select * from t;

$ python3 psycopg2-unicode.py sql0.sql sql1.sql

$ cat sql1.sql.out 
The quick brown fox jumps over the lazy dog.
Zwölf große Boxkämpfer jagen Viktor quer über den Sylter Deich.
Любя, съешь щипцы, — вздохнёт мэр, — кайф жгуч.
Mężny bądź, chroń pułk twój i sześć flag.

这个程序的Python2版本有点复杂,因为我们需要告诉驱动程序我们希望将值作为unicode对象返回。我用于输出的csv模块也不支持unicode,因此需要一种解决方法。这是:

$ cat psycopg2-unicode2.py
from __future__ import print_function

import sys
import os
import csv
import codecs

import psycopg2
import psycopg2.extensions
psycopg2.extensions.register_type(psycopg2.extensions.UNICODE)
psycopg2.extensions.register_type(psycopg2.extensions.UNICODEARRAY)

with psycopg2.connect("") as conn:
    conn.set_client_encoding("utf-8")
    for filename in sys.argv[1:]:
        file = codecs.open(filename, "r", encoding="utf-8")
        sql = file.read()
        with conn.cursor() as cursor:
            cursor.execute(sql)
            try:
                rows = cursor.fetchall()
            except psycopg2.ProgrammingError as err:
                # No results from SQL
                continue
            with open(filename+".out", "wb") as outfile:
                for row in rows:
                    row_utf8 = [v.encode('utf-8') for v in row]
                    csv.writer(outfile, dialect="excel-tab").writerow(row_utf8)