熊猫给出记忆错误

时间:2018-05-25 12:27:09

标签: python python-3.x pandas

我有几乎10K + csv文件的代码,每个文件有近16K +行有多列。我运行代码,5分钟后我得到以下错误。我可以理解,如果我设置low_memory = False,它将抑制错误。但是如何解决这个问题呢? 由于以下原因,错误似乎即将到来。可以修复吗?

df.groupby(['A', 'B'])['C'] 

DtypeWarning: Columns (9,11,12,13,14) have mixed types. Specify dtype option on import or set low_memory=False.

  File "\Python36-32\lib\site-packages\pandas\io\parsers.py", line 705, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "\Python36-32\lib\site-packages\pandas\io\parsers.py", line 451, in _read
    data = parser.read(nrows)
  File "\Python36-32\lib\site-packages\pandas\io\parsers.py", line 1083, in read
    df = DataFrame(col_dict, columns=columns, index=index)
  File "\Python36-32\lib\site-packages\pandas\core\frame.py", line 330, in __init__
    mgr = self._init_dict(data, index, columns, dtype=dtype)
  File "\Python36-32\lib\site-packages\pandas\core\frame.py", line 461, in _init_dict
    return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
  File "\Python36-32\lib\site-packages\pandas\core\frame.py", line 6140, in _arrays_to_mgr
    return create_block_manager_from_arrays(arrays, arr_names, axes)
  File "\Python36-32\lib\site-packages\pandas\core\internals.py", line 4632, in create_block_manager_from_arrays
    blocks = form_blocks(arrays, names, axes)
  File "\Python36-32\lib\site-packages\pandas\core\internals.py", line 4704, in form_blocks
    int_blocks = _multi_blockify(int_items)
  File "\Python36-32\lib\site-packages\pandas\core\internals.py", line 4773, in _multi_blockify
    values, placement = _stack_arrays(list(tup_block), dtype)
  File "\Python36-32\lib\site-packages\pandas\core\internals.py", line 4816, in _stack_arrays
    stacked = np.empty(shape, dtype=dtype)
MemoryError

1 个答案:

答案 0 :(得分:0)

检查您的总数据量是否与总RAM的幅度相同。

在使用~10GB的数据集(在具有16GB RAM的计算机上)并同时在Chrome上运行多个标签时,我犯了类似的错误。

如果发生这种情况,请在处理完数据帧之后尝试删除数据帧,然后再阅读下一个csv文件:

allFiles = glob.glob(path + "/*.csv")
for file in allFiles:
    df = pd.read_csv(file)
    process(df)    
    del(df)

OBS:通常,您需要的内存量是您正在使用的数据量的10倍,以便能够在熊猫中流畅地工作。