当我加载带有pandas的大型CSV文件时,我得到以下MemoryError:
Traceback (most recent call last):
File "/home/k/workspace/loans/src/loans.py", line 100, in <module>
X_test = testdata('test_v2.csv')
File "/home/k/workspace/loans/src/loans.py", line 18, in testdata
X = pd.read_table(filename, sep=',', warn_bad_lines=True, error_bad_lines=True)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 420, in parser_f
return _read(filepath_or_buffer, kwds)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 225, in _read
return parser.read()
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 626, in read
ret = self._engine.read(nrows)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 1070, in read
data = self._reader.read(nrows)
File "parser.pyx", line 727, in pandas.parser.TextReader.read (pandas/parser.c:6866)
File "parser.pyx", line 777, in pandas.parser.TextReader._read_low_memory (pandas/parser.c:7452)
File "parser.pyx", line 1788, in pandas.parser._concatenate_chunks (pandas/parser.c:20462)
MemoryError
文件大小为1 GB。 R打开它没有太多麻烦(这很奇怪,因为如果我理解正确,R比Python更高级别......)
我在Intel(R)Core(TM)i3 CPU 550 @ 3.20GHz上使用4GB RAM运行代码。我在Linux Ubuntu 12.04 32位上运行代码。
有什么技巧可以让它发挥作用吗?
谢谢!