我有以下Python代码:
from urllib import request
url_base = "https://translate.google.com"
url_params_list = "/#view=home&op=translate&sl=ru&tl=en&text="
with open('top5000russianlemmasraw.csv') as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',')
for row in csv_reader:
url = url_base + url_params_list + request.quote(row[0].encode('cp1251'))
print(url)
文件top5000russianlemmasraw.csv
是西里尔字母的单词列表。
该代码的问题是,将西里尔文字导入为问号字符串,例如'????'
,然后以'%3F%3F%3F%3F'
类型的字符串转换为URL代码。我不确定如何使Python导入西里尔字母,以便它不会显示为问号。希望能对此有所帮助。
答案 0 :(得分:0)
open()
的内置默认值为locale.getpreferredencoding()
返回的编码。您可以使用关键字参数覆盖此
# ...
with open('top5000russianlemmasraw.csv', encoding='cp1251') as csv_file:
# ...
或者,您也可以按字节打开文件,然后按块解码
with open('top5000russianlemmasraw.csv', 'rb') as csv_file:
blob = csv_file.read()
text = blob.decode('cp1251')
# ...