我根据我选择的主题使用TextBlob的NaiveBayesclassifier
进行文本分析。
数据量巨大(约3000个条目)。
虽然我能够得到一个结果,但是如果不再调用该函数并等待数小时直到处理完成,我将无法将其保存以备将来使用。
我尝试通过以下方法进行酸洗
ab = NaiveBayesClassifier(data)
import pickle
object = ab
file = open('f.obj','w') #tried to use 'a' in place of 'w' ie. append
pickle.dump(object,file)
我收到了一个错误,如下所示:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\pickle.py", line 1370, in dump
Pickler(file, protocol).dump(obj)
File "C:\Python27\lib\pickle.py", line 224, in dump
self.save(obj)
File "C:\Python27\lib\pickle.py", line 331, in save
self.save_reduce(obj=obj, *rv)
File "C:\Python27\lib\pickle.py", line 419, in save_reduce
save(state)
File "C:\Python27\lib\pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "C:\Python27\lib\pickle.py", line 649, in save_dict
self._batch_setitems(obj.iteritems())
File "C:\Python27\lib\pickle.py", line 663, in _batch_setitems
save(v)
File "C:\Python27\lib\pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "C:\Python27\lib\pickle.py", line 600, in save_list
self._batch_appends(iter(obj))
File "C:\Python27\lib\pickle.py", line 615, in _batch_appends
save(x)
File "C:\Python27\lib\pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "C:\Python27\lib\pickle.py", line 562, in save_tuple
save(element)
File "C:\Python27\lib\pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "C:\Python27\lib\pickle.py", line 649, in save_dict
self._batch_setitems(obj.iteritems())
File "C:\Python27\lib\pickle.py", line 662, in _batch_setitems
save(k)
File "C:\Python27\lib\pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "C:\Python27\lib\pickle.py", line 501, in save_unicode
self.memoize(obj)
File "C:\Python27\lib\pickle.py", line 247, in memoize
self.memo[id(obj)] = memo_len, obj
MemoryError
我也试过过sPickle,但也导致了错误,例如:
#saving object with function sPickle.s_dump
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\site-packages\sPickle.py", line 22, in s_dump
for elt in iterable_to_pickle:
TypeError: 'NaiveBayesClassifier' object is not iterable
#saving object with function sPickle.s_dump_elt
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\site-packages\sPickle.py", line 28, in s_dump_elt
pickled_elt_str = dumps(elt_to_pickle)
MemoryError: out of memory
有谁能告诉我要保存对象需要做什么?
或者无论如何保存分类器的结果以供将来使用?
答案 0 :(得分:5)
首先使用 64位版本的Python (适用于2.6到3.4的所有版本)
64位版本解决了所有内存问题
使用cPickle
import cPickle as pickle
其次打开你的文件
file = open('file_name.pickle','wb') #same as what Robert said in the above post
在文件上写入对象
pickle.dump(object,file)
你的对象将被转储到一个文件中。 但你必须检查你的对象使用了什么内存。 pickleing也占用了内存空间,因此至少有25%的内存可用于被腌制的对象
对我来说,我的笔记本电脑有一个8 GB的RAM,所以内存足以只有一个对象。
(我的分类器很重,有3000个字符串实例,每个字符串包含大约15-30个单词的句子。情绪/主题的数量是22个。)
因此,如果您的笔记本电脑死锁(或者,一般而言,停止工作),那么您可能需要关闭电源并重新开始并尝试使用较小的号码。实例或更少的没有。情感/主题。
在这里,cPickle是非常有用的bcz它比任何其他的腌制模块快得多,并且我建议使用它。
答案 1 :(得分:2)
您需要将“wb”用于二进制格式:
file = open('f.obj','wb')
答案 2 :(得分:0)
对于Python> 3.0,似乎cPickle不再存在,但是默认的pickle可以完成工作,只需确保使用适合您的python安装的协议即可。对于python> 3.4,请使用以下代码:
import pickle
with open(r"blobClassifier.pickle",'wb') as file:
pickle.dump(cl_Title, file, protocol=pickle.HIGHEST_PROTOCOL,fix_imports=False)