Question

我一直在研究将文本压缩成ascii的方法。所以ā - ＆gt; a 和ñ - ＆gt; n 等

unidecode对此非常棒。

# -*- coding: utf-8 -*-
from unidecode import unidecode
print(unidecode(u"ā, ī, ū, ś, ñ"))
print(unidecode(u"Estado de São Paulo"))

产地：

a, i, u, s, n
Estado de Sao Paulo

但是，我无法使用输入文件中的数据复制此结果。

test.txt文件的内容：

ā, ī, ū, ś, ñ
Estado de São Paulo

# -*- coding: utf-8 -*-
from unidecode import unidecode
with open("test.txt", 'r') as inf:
    for line in inf:
        print unidecode(line.strip())

产地：

A, A<<, A<<, A, A+-
Estado de SAPSo Paulo

和

RuntimeWarning：Argument不是unicode对象。传递编码字符串可能会产生意外结果。

问题：如何以unicode格式读取这些行，以便将其传递给unidecode？

Answer 1

with codecs.open("test.txt", 'r', 'utf-8') as inf:

Answer 2

import codecs
with codecs.open('test.txt', encoding='whicheveronethefilewasencodedwith') as f:
    ...

codecs模块提供了一个打开具有自动Unicode编码/解码功能的文件的功能。