如何在python中将大型二进制文件转换为pickle词典?

时间:2018-06-19 21:15:27

标签: python-3.x pickle fasttext

我正在尝试将包含300个维向量的阿拉伯单词的大型二进制文件转换为pickle词典

到目前为止,我写的是:

import pickle
ArabicDict = {}
with open('cc.ar.300.bin', encoding='utf-8') as lex:
    for token in lex:
         for line in lex.readlines():
             data = line.split()
             ArabicDict[data[0]] = float(data[1])

pickle.dump(ArabicDict,open("ArabicDictionary.p","wb"))

我得到的错误是:

Traceback (most recent call last):
  File "E:\Dataset", line 4, in <module>
    for token in lex:
  File "E:\lib\codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xba in position 0: invalid start byte

0 个答案:

没有答案