尝试使用掩码为具有3列的4M行表选择值时,出现内存错误。
当我运行df.memory_usage().sum()
时,它返回173526080
,等同于1,38820864 gb
,我有32GB的RAM。所以它似乎不应该用完RAM,因为以前没有代码占用大量的RAM。
此方法适用于具有相同4M行的先前版本的代码。
我运行的代码是:
x = df[exit_point] > 0
print(df[x].shape)
我得到的错误是:
File "C:\Users\joaoa\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\frame.py", line 2133, in __getitem__
return self._getitem_array(key)
File "C:\Users\joaoa\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\frame.py", line 2175, in _getitem_array
return self._take(indexer, axis=0, convert=False)
File "C:\Users\joaoa\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\generic.py", line 2143, in _take
self._consolidate_inplace()
File "C:\Users\joaoa\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\generic.py", line 3677, in _consolidate_inplace
self._protect_consolidate(f)
File "C:\Users\joaoa\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\generic.py", line 3666, in _protect_consolidate
result = f()
File "C:\Users\joaoa\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\generic.py", line 3675, in f
self._data = self._data.consolidate()
File "C:\Users\joaoa\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\internals.py", line 3826, in consolidate
bm._consolidate_inplace()
File "C:\Users\joaoa\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\internals.py", line 3831, in _consolidate_inplace
self.blocks = tuple(_consolidate(self.blocks))
File "C:\Users\joaoa\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\internals.py", line 4853, in _consolidate
_can_consolidate=_can_consolidate)
File "C:\Users\joaoa\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\internals.py", line 4876, in _merge_blocks
new_values = new_values[argsort]
MemoryError
我迷失了如何开始调试这个。任何线索和提示都会非常感激。
答案 0 :(得分:1)
也许这会有所帮助:
[1]导入文件时使用 low_memory = False 参数。例如:
df = pd.read_csv('filepath', low_memory=False)
[2]导入文件时使用 dtype 参数。
[3]如果你使用Jupyter Notebook:Kernel>重启&清除输出。
希望这有帮助!