Question

我的目标是读取文件中的行，并将所有特殊字符（如法语字符（à，é，ç，...））替换为普通字符（a，e，c，...）

我使用Python 3，并且在gensim文档中，该示例使用一个简单的语句（例如：deaccent（“àéç））工作，但不适用于我从文件中读取的行目前，我的代码只得到“àéç”而不是“ aec”

from gensim.utils import deaccent

def getTextFromFile(filename):
    with open(filename) as file:
        text = [line.rstrip() for line in file.readlines()]
    file.close()
    for line in text:
        print(deaccent(line))
    return text

我的文件包含：àéç

我想得到：aec

Answer 1

据我所知，它工作正常：

Python 3.7.0 (default, Aug 22 2018, 20:50:05) 
Type "copyright", "credits" or "license" for more information.
IPython 4.1.2 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.
In [1]: from gensim.utils import deaccent
In [2]: deaccent('àéç')
Out[2]: 'aec'
In [3]: astr = 'àéç'
In [4]: dstr = deaccent(astr)
In [5]: print(dstr)
aec

如果您想让getTextFromFile()方法返回没有重音符号的文本，请不要返回原始的text，而是返回deaccent()调用的结果

如何使用gensim中正确使用的降音方法？

1 个答案: