如何以Unicode格式打开文档? 我有包含外来字符的txt文件。我需要使用这个unidecode函数逐字打开它。 我收到错误说 - TypeError:' module'对象不可调用
import os
import re
import unidecode
def splitToWords(stringOfWords):
retVal = re.split('; |;|, |,|\*|\n|\. |\.|-| |\"',stringOfWords)
while '' in retVal:
retVal.remove('')
[val.lower() for val in retVal]
return retVal
....
with open(file,"r") as f:
file_content = f.read()
file_content = splitToWords(file_content)
for word in file_content
word = unidecode.unidecode(word)
f.close()
答案 0 :(得分:1)
您好,请检查以下代码,这是您想要的吗?
unicodestring = "u there"
utf8tostring = unicodestring.encode("utf-16")
print utf8tostring
代码来自以下网站https://www.safaribooksonline.com/library/view/python-cookbook-2nd/0596007973/ch01s22.html
答案 1 :(得分:0)
您可以尝试这样的事情:
# you have to import unidecode function first
from unidecode import unidecode
with open(file) as f:
for line in f:
# this will split a line to words and decode them.
# you don't have to close() the file, "with open()" does that for you.
decoded_words = [unidecode(word) for word in line.split()]