Question

我正在尝试读入以utf-8编码的“ CDPQ17CEO.txt”，请参见此图： Notepad++ Encoding

这是read_in函数（在Letter类中）：

class Letter(object):

def __init__(self, file_path, company_name, author_name=None, author_type = None):
   self.letter = self._read_in(file_path)
   self.company = company_name
   self.author = author_name
   self.type = author_type

def _read_in(self, file_path):
    f = open(file_path, 'r', encoding='utf-8', errors='ignore').readlines()
    f_stripped = [line.strip() for line in f]
    f.close()
    return ' '.join(f_stripped)

这是函数调用：

full_file = 'Q:\My Documents\OTPP\letters\CDPQ17CEO.txt'    
letter_dict[name]=px.Letter(full_file, name, author_type=author_type)

这是错误：

UnicodeDecodeError：“ charmap”编解码器无法解码位置1936中的字节0x9d：字符映射为undefined>

为什么错误='忽略'不执行任务？

如果我将文本文档打开并将其转换为ANSI，然后重新保存并重新运行，那么确实可以，但是我宁愿避免对我需要阅读的所有文档执行此操作。

谢谢！

Answer 1

问题和解决方案：

px模块实际上并未导入，尽管它似乎是
通过将模块的路径添加到PYTHONPATH解决了问题
```
import sys
sys.path.append('foo')
```

以utf-8编码的文本文件，Python提供UnicodeDecodeError，忽略不起作用的错误

1 个答案: