Question

简单的问题，无法制定解决方案。

我试图从编码为UTF-8的Postgres数据库中检索多字节字符，然后返回它们，但我遇到了编码问题。

这是我的DB：

   Name    |  Owner   | Encoding |   Collate   |    Ctype    |     Access privileges
-----------+----------+----------+-------------+-------------+---------------------------
 articles  | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 |

表格中的数据：

                         docid                         |     unigram
-------------------------------------------------------+-----------------
 en_2014-02-09_5eb67dc1927248d7926cdaf72559b57a7f9c017 | Haluk Bürümekçi

＆＃39; unigram＆＃39;有一些多字节字符。这是我简化的Python：

def test():
    con = psycopg2.connect(params)
    cur = con.cursor()

    cur.execute("SELECT docid, unigram FROM test")

    row = cur.fetchone()

    try:
        print unicode(row[1])
    except Exception, E:
        traceback.print_exc()

这导致：

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 7: ordinal not in range(128)

我尝试过很多不同的事情，包括：

row[1].decode(sys.getdefaultencoding()).encode('utf-8')
row[1].decode('utf-8')
row[1].encode('utf-8')
unicode(row[1])
str(row[1])

所有这些以及类似尝试的更多迭代仍会导致UnicodeDecodeError。有谁知道我到底做错了什么？

Answer 1

使用unicode(row[1], 'utf-8')。这通过使用row[1]编解码器解码utf-8中的字符串来构造unicode字符串：）

UnicodeEncodeError - Python / Django和Postgres

1 个答案: