Python 3 CSV读取无法识别西里尔文字

时间:2018-12-20 17:37:16

标签: python-3.x

我有以下Python代码:

from urllib import request

url_base = "https://translate.google.com"
url_params_list = "/#view=home&op=translate&sl=ru&tl=en&text="

with open('top5000russianlemmasraw.csv') as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=',')
    for row in csv_reader:
        url = url_base + url_params_list + request.quote(row[0].encode('cp1251'))

        print(url)

文件top5000russianlemmasraw.csv是西里尔字母的单词列表。

该代码的问题是,将西里尔文字导入为问号字符串,例如'????',然后以'%3F%3F%3F%3F'类型的字符串转换为URL代码。我不确定如何使Python导入西里尔字母,以便它不会显示为问号。希望能对此有所帮助。

1 个答案:

答案 0 :(得分:0)

open()的内置默认值为locale.getpreferredencoding()返回的编码。您可以使用关键字参数覆盖此

# ...
with open('top5000russianlemmasraw.csv', encoding='cp1251') as csv_file:
    # ...

或者,您也可以按字节打开文件,然后按块解码

with open('top5000russianlemmasraw.csv', 'rb') as csv_file:
    blob = csv_file.read()
    text = blob.decode('cp1251')
    # ...