为什么我不能在DB中写入数据帧?

时间:2016-07-11 10:48:07

标签: database python-2.7 pandas

我有32 GB的RAM,我使用jupyter和pandas。我的数据帧不是很大,但是当我想在北极数据库中写它时,我有“MemoryError”:

df_q.shape
(157293660, 10)
def memory(df):
    mem = df.memory_usage(index=True).sum() / (1024 ** 3)
    print(mem)
memory(df_q)
12.8912200034

我想写一下:

from arctic import Arctic
import arctic as arc
store = Arctic('.....')
lib = store['myLib']
lib.write('quotes', df_q)
  

MemoryError Traceback(最近一次调用   最后)in()         1个记忆(df_q)   ----> 2 lib.write('quotes',df_q)

     

/usr/local/lib/python2.7/dist-packages/arctic/decorators.pyc in   f_retry(* args,** kwargs)        48而True:        49尝试:   ---> 50返回f(* args,** kwargs)        51除了(DuplicateKeyError,ServerSelectionTimeoutError)为e:        52#重新提出不会消失的错误。

     

/usr/local/lib/python2.7/dist-packages/arctic/store/version_store.pyc   在写(自我,符号,数据,元数据,prune_previous_version,   ** kwargs)       561       562 handler = self._write_handler(version,symbol,data,** kwargs)    - > 563 mongo_retry(handler.write)(self._arctic_lib,version,symbol,data,previous_version,** kwargs)       564       565#将新版本插入版本DB

     

/usr/local/lib/python2.7/dist-packages/arctic/decorators.pyc in   f_retry(* args,** kwargs)        48而True:        49尝试:   ---> 50返回f(* args,** kwargs)        51除了(DuplicateKeyError,ServerSelectionTimeoutError)为e:        52#重新提出不会消失的错误。

     

/usr/local/lib/python2.7/dist-packages/arctic/store/_pandas_ndarray_store.pyc   写中(self,arctic_lib,version,symbol,item,previous_version)       301 def write(self,arctic_lib,version,symbol,item,previous_version):       302项,md = self.to_records(item)    - > 303 super(PandasDataFrameStore,self).write(arctic_lib,version,symbol,item,previous_version,dtype = md)       304       305 def append(self,arctic_lib,version,symbol,item,previous_version):

     

/usr/local/lib/python2.7/dist-packages/arctic/store/_ndarray_store.pyc   in write(self,arctic_lib,version,symbol,item,previous_version,   D型)       385版本['type'] = self.TYPE       386版['up_to'] = len(项目)    - > 387版['sha'] = self.checksum(item)       388       389 if previous_version:

     

/usr/local/lib/python2.7/dist-packages/arctic/store/_ndarray_store.pyc   在校验和(自我,项目)       370 def校验和(self,item):       371 sha = hashlib.sha1()    - > 372 sha.update(item.tostring())       373返回Binary(sha.digest())       374

     

的MemoryError:

WTF? 如果我使用df_q.to_csv(),我会等待多年......

1 个答案:

答案 0 :(得分:0)

Your issue actually is not a memory issue. If you read your errors, it seems that your library is having trouble accessing your data...

1st Error: Says your server has timed out. (ServerSelectionTimeoutError)

2nd Error: Trying to update MongoDB version.

3rd Error: Retries accessing your server, fails.(ServerSelectionTimeoutError)

etc. So essentially your problem lies in the Arctic package itself (see last error is a checksum error). You can also deduce this from the fact that df_q.to_csv() works, however it is very slow since it is not optimized like Artic. I would suggest trying to reinstall the Arctic package