我有32 GB的RAM,我使用jupyter和pandas。我的数据帧不是很大,但是当我想在北极数据库中写它时,我有“MemoryError”:
df_q.shape
(157293660, 10)
def memory(df):
mem = df.memory_usage(index=True).sum() / (1024 ** 3)
print(mem)
memory(df_q)
12.8912200034
我想写一下:
from arctic import Arctic
import arctic as arc
store = Arctic('.....')
lib = store['myLib']
lib.write('quotes', df_q)
MemoryError Traceback(最近一次调用 最后)in() 1个记忆(df_q) ----> 2 lib.write('quotes',df_q)
/usr/local/lib/python2.7/dist-packages/arctic/decorators.pyc in f_retry(* args,** kwargs) 48而True: 49尝试: ---> 50返回f(* args,** kwargs) 51除了(DuplicateKeyError,ServerSelectionTimeoutError)为e: 52#重新提出不会消失的错误。
/usr/local/lib/python2.7/dist-packages/arctic/store/version_store.pyc 在写(自我,符号,数据,元数据,prune_previous_version, ** kwargs) 561 562 handler = self._write_handler(version,symbol,data,** kwargs) - > 563 mongo_retry(handler.write)(self._arctic_lib,version,symbol,data,previous_version,** kwargs) 564 565#将新版本插入版本DB
/usr/local/lib/python2.7/dist-packages/arctic/decorators.pyc in f_retry(* args,** kwargs) 48而True: 49尝试: ---> 50返回f(* args,** kwargs) 51除了(DuplicateKeyError,ServerSelectionTimeoutError)为e: 52#重新提出不会消失的错误。
/usr/local/lib/python2.7/dist-packages/arctic/store/_pandas_ndarray_store.pyc 写中(self,arctic_lib,version,symbol,item,previous_version) 301 def write(self,arctic_lib,version,symbol,item,previous_version): 302项,md = self.to_records(item) - > 303 super(PandasDataFrameStore,self).write(arctic_lib,version,symbol,item,previous_version,dtype = md) 304 305 def append(self,arctic_lib,version,symbol,item,previous_version):
/usr/local/lib/python2.7/dist-packages/arctic/store/_ndarray_store.pyc in write(self,arctic_lib,version,symbol,item,previous_version, D型) 385版本['type'] = self.TYPE 386版['up_to'] = len(项目) - > 387版['sha'] = self.checksum(item) 388 389 if previous_version:
/usr/local/lib/python2.7/dist-packages/arctic/store/_ndarray_store.pyc 在校验和(自我,项目) 370 def校验和(self,item): 371 sha = hashlib.sha1() - > 372 sha.update(item.tostring()) 373返回Binary(sha.digest()) 374
的MemoryError:
WTF? 如果我使用df_q.to_csv(),我会等待多年......
答案 0 :(得分:0)
Your issue actually is not a memory issue. If you read your errors, it seems that your library is having trouble accessing your data...
1st Error: Says your server has timed out. (ServerSelectionTimeoutError
)
2nd Error: Trying to update MongoDB version.
3rd Error: Retries accessing your server, fails.(ServerSelectionTimeoutError
)
etc. So essentially your problem lies in the Arctic package itself (see last error is a checksum error). You can also deduce this from the fact that df_q.to_csv()
works, however it is very slow since it is not optimized like Artic. I would suggest trying to reinstall the Arctic package