内存错误:屏蔽不是那么大的数据帧抛出错误

时间:2018-03-19 17:16:11

标签: python pandas dataframe memory

尝试使用掩码为具有3列的4M行表选择值时,出现内存错误。

当我运行df.memory_usage().sum()时,它返回173526080,等同于1,38820864 gb,我有32GB的RAM。所以它似乎不应该用完RAM,因为以前没有代码占用大量的RAM。

此方法适用于具有相同4M行的先前版本的代码。

我运行的代码是:

x = df[exit_point] > 0
print(df[x].shape)

我得到的错误是:

  File "C:\Users\joaoa\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\frame.py", line 2133, in __getitem__
    return self._getitem_array(key)
  File "C:\Users\joaoa\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\frame.py", line 2175, in _getitem_array
    return self._take(indexer, axis=0, convert=False)
  File "C:\Users\joaoa\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\generic.py", line 2143, in _take
    self._consolidate_inplace()
  File "C:\Users\joaoa\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\generic.py", line 3677, in _consolidate_inplace
    self._protect_consolidate(f)
  File "C:\Users\joaoa\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\generic.py", line 3666, in _protect_consolidate
    result = f()
  File "C:\Users\joaoa\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\generic.py", line 3675, in f
    self._data = self._data.consolidate()
  File "C:\Users\joaoa\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\internals.py", line 3826, in consolidate
    bm._consolidate_inplace()
  File "C:\Users\joaoa\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\internals.py", line 3831, in _consolidate_inplace
    self.blocks = tuple(_consolidate(self.blocks))
  File "C:\Users\joaoa\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\internals.py", line 4853, in _consolidate
    _can_consolidate=_can_consolidate)
  File "C:\Users\joaoa\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\internals.py", line 4876, in _merge_blocks
    new_values = new_values[argsort]
MemoryError

我迷失了如何开始调试这个。任何线索和提示都会非常感激。

1 个答案:

答案 0 :(得分:1)

也许这会有所帮助:

[1]导入文件时使用 low_memory = False 参数。例如:

df = pd.read_csv('filepath', low_memory=False)

[2]导入文件时使用 dtype 参数。

[3]如果你使用Jupyter Notebook:Kernel>重启&清除输出。

希望这有帮助!