我有一个大型数据框,大约有3.92亿行和9列。我想对数据集应用过滤器来提取子集。
此处我的原始数据集为dh_activity_recos
dh_activity_approved = dh_activity_recos.loc[dh_activity_recos.approved_flag == 1]
现在,当我应用此过滤器时,我收到以下内存错误:
Traceback (most recent call last):
File "/mnt01/eh-datasci/ravinder/working/final_recos_processing.py", line 144, in <module>
dh_activity_approved = dh_activity_recos.loc[dh_activity_recos.approved_flag == 1]
File "/home/ubuntu/anaconda/lib/python2.7/site-packages/pandas/core/indexing.py", line 1227, in __getitem__
return self._getitem_axis(key, axis=0)
File "/home/ubuntu/anaconda/lib/python2.7/site-packages/pandas/core/indexing.py", line 1344, in _getitem_axis
return self._getbool_axis(key, axis=axis)
File "/home/ubuntu/anaconda/lib/python2.7/site-packages/pandas/core/indexing.py", line 1239, in _getbool_axis
raise self._exception(detail)
KeyError: MemoryError()
我无法理解这背后的确切原因。我已用dir()
命令检查过;除了这个大型数据集之外,没有任何其他耗费内存的资源。此外,我正在使用128GB RAM在云端执行此操作,因此我不确定为什么会出现此错误。