我有几乎10K + csv文件的代码,每个文件有近16K +行有多列。我运行代码,5分钟后我得到以下错误。我可以理解,如果我设置low_memory = False,它将抑制错误。但是如何解决这个问题呢? 由于以下原因,错误似乎即将到来。可以修复吗?
df.groupby(['A', 'B'])['C']
DtypeWarning: Columns (9,11,12,13,14) have mixed types. Specify dtype option on import or set low_memory=False.
File "\Python36-32\lib\site-packages\pandas\io\parsers.py", line 705, in parser_f
return _read(filepath_or_buffer, kwds)
File "\Python36-32\lib\site-packages\pandas\io\parsers.py", line 451, in _read
data = parser.read(nrows)
File "\Python36-32\lib\site-packages\pandas\io\parsers.py", line 1083, in read
df = DataFrame(col_dict, columns=columns, index=index)
File "\Python36-32\lib\site-packages\pandas\core\frame.py", line 330, in __init__
mgr = self._init_dict(data, index, columns, dtype=dtype)
File "\Python36-32\lib\site-packages\pandas\core\frame.py", line 461, in _init_dict
return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
File "\Python36-32\lib\site-packages\pandas\core\frame.py", line 6140, in _arrays_to_mgr
return create_block_manager_from_arrays(arrays, arr_names, axes)
File "\Python36-32\lib\site-packages\pandas\core\internals.py", line 4632, in create_block_manager_from_arrays
blocks = form_blocks(arrays, names, axes)
File "\Python36-32\lib\site-packages\pandas\core\internals.py", line 4704, in form_blocks
int_blocks = _multi_blockify(int_items)
File "\Python36-32\lib\site-packages\pandas\core\internals.py", line 4773, in _multi_blockify
values, placement = _stack_arrays(list(tup_block), dtype)
File "\Python36-32\lib\site-packages\pandas\core\internals.py", line 4816, in _stack_arrays
stacked = np.empty(shape, dtype=dtype)
MemoryError
答案 0 :(得分:0)
检查您的总数据量是否与总RAM的幅度相同。
在使用~10GB的数据集(在具有16GB RAM的计算机上)并同时在Chrome上运行多个标签时,我犯了类似的错误。
如果发生这种情况,请在处理完数据帧之后尝试删除数据帧,然后再阅读下一个csv文件:
allFiles = glob.glob(path + "/*.csv")
for file in allFiles:
df = pd.read_csv(file)
process(df)
del(df)
OBS:通常,您需要的内存量是您正在使用的数据量的10倍,以便能够在熊猫中流畅地工作。