在python中使用pandas加载.csv文件时出错

时间:2017-02-08 03:48:19

标签: python csv pandas

我有一个大型的csv文件,大约6GB,并且需要花费大量时间来加载到python。我收到以下错误:

import pandas as pd
df = pd.read_csv('nyc311.csv', low_memory=False)


Python(1284,0x7fffa37773c0) malloc: *** mach_vm_map(size=18446744071562067968) failed (error code=3)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/io/parsers.py", line 646, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/io/parsers.py", line 401, in _read
    data = parser.read()
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/io/parsers.py", line 939, in read
    ret = self._engine.read(nrows)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/io/parsers.py", line 1508, in read
    data = self._reader.read(nrows)
  File "pandas/parser.pyx", line 851, in pandas.parser.TextReader.read (pandas/parser.c:10438)
  File "pandas/parser.pyx", line 939, in pandas.parser.TextReader._read_rows (pandas/parser.c:11607)
  File "pandas/parser.pyx", line 2024, in pandas.parser.raise_parser_error (pandas/parser.c:27037)
pandas.io.common.CParserError: Error tokenizing data. C error: out of memory

我不认为我理解错误代码,最后一行似乎暗示文件太大而无法加载?我也试过low_memory=FALSE选项,但这也不起作用。

我不确定“无法分配区域”是什么意思,标题是否包含'region'并且pandas无法找到下面的列?

1 个答案:

答案 0 :(得分:0)

由于RAM导致内存不足问题。 对此没有其他解释。

  

RAM内对象的所有数据内存开销的总和!&lt; RAM

malloc: *** mach_vm_map(size=18446744071562067968) failed您可以从此错误陈述中清楚地理解。

尝试使用。

df = pd.read_csv('nyc311.csv',chunksize =5000,lineterminator='\r')

或者,如果读取此csv只是程序的一部分,并且之前创建了其他任何数据帧,请尝试在不使用时清除它们。

import gc
del old_df              #clear dataframes not in use
gc.collect()        # collect Garbage 
del gc.garbage[:]   # Clears RAM

`