使用特定python库时使用UnicodeDecodeError(性别检测器)

时间:2017-12-22 14:06:04

标签: python python-unicode

我需要进行性别猜测以进行一些分析,经过一些研究后我在github上找到了这个Python库:malev/gender-detector

按照说明操作并进行一些调整后(例如,自述文件指示import gender_detector as gd但我需要这样做 from gender_detector import gender_detector as gd

然后发生这种情况,lib有4个数据集,'us','uk','ar','uy',但仅在使用'us'或'uk'时才有效

见下面的例子:

from gender_detector import gender_detector as gd
detector = gd.GenderDetector('us')
detector2 = gd.GenderDetector('ar')

detector.guess('Marcos')
Out[25]: 'male'

detector2.guess('Marcos')
Traceback (most recent call last):

File "", line 1, in 
detector2.guess('Marcos')

File "/home/cpneto/anaconda3/lib/python3.6/site-packages/gender_detector/gender_detector.py", line 25, in guess
initial_position = self.index(name[0])

File "/home/cpneto/anaconda3/lib/python3.6/site-packages/gender_detector/index.py", line 19, in call
self._generate_index()

File "/home/cpneto/anaconda3/lib/python3.6/site-packages/gender_detector/index.py", line 25, in _generate_index
total = file.readline() # Omit headers line

File "/home/cpneto/anaconda3/lib/python3.6/codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf1 in position 1078: invalid continuation byte

我相信这是因为py2与py3的兼容性,但我不确定,并且没有任何关于如何解决这个问题的线索。

有什么建议吗?

1 个答案:

答案 0 :(得分:0)

该库假定您的ar文件是UTF-8编码的,但它不是(因此byte 0xf1 in position 1078错误)。您需要将文件转换为UTF-8或找到将实际编码传递给库的方法。