Question

代码可以在这里下载： https://github.com/kelrien/pyretrieval/

每当我执行example.py时，都会弹出以下错误：

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "example.py", line 21, in <module>
    docs.append(proc.process(line.decode("utf-8")))
  File "pyretrieval\processor.py", line 61, in process
    tokens = self.tokenize(string)
  File "pyretrieval\processor.py", line 47, in tokenize
    temp = temp.replace(char, self.replace_characters[char])
UnicodeDecodeError: 'ascii' codec can't decode byte 0xfc in position 0: ordinal not in range(128)

如您所见 - 尝试替换我指定的德语变音时会发生错误。如果我不使用replace_characters dict而忽略那些变音符号，我就不会收到错误。

我已经尝试了很多东西：

使用编解码器模块
在不同的

Answer 1

我找到了解决方案。我不得不在unicode中编码我想要替换的字符（在processor.py中）。

我已经将必要的更改推送到github。 https://github.com/kelrien/pyretrieval

在utf-8编码文件中替换不需要的字符时编码错误

1 个答案: