Python unidecode函数打开列表/文档

时间:2015-09-08 09:20:28

标签: python text unicode

如何以Unicode格式打开文档? 我有包含外来字符的txt文件。我需要使用这个unidecode函数逐字打开它。 我收到错误说 - TypeError:' module'对象不可调用

import os
import re
import unidecode

def splitToWords(stringOfWords):
    retVal = re.split('; |;|, |,|\*|\n|\. |\.|-| |\"',stringOfWords)
    while '' in retVal:
        retVal.remove('')
    [val.lower() for val in retVal]
    return retVal
....
       with open(file,"r") as f:
        file_content = f.read()
        file_content = splitToWords(file_content)
        for word in file_content
        word = unidecode.unidecode(word)
        f.close()

2 个答案:

答案 0 :(得分:1)

您好,请检查以下代码,这是您想要的吗?

unicodestring = "u there"
utf8tostring = unicodestring.encode("utf-16")
print utf8tostring

代码来自以下网站https://www.safaribooksonline.com/library/view/python-cookbook-2nd/0596007973/ch01s22.html

答案 1 :(得分:0)

您可以尝试这样的事情:

# you have to import unidecode function first
from unidecode import unidecode

with open(file) as f:
    for line in f:
       # this will split a line to words and decode them.
       # you don't have to close() the file, "with open()" does that for you.
       decoded_words = [unidecode(word) for word in line.split()]