我的电脑有64GB RAM。我尝试使用pandas从大小为200gb的TSV文件中提取数据。我尝试使用pandas.read_csv按块提取数据块,每个块是50,000个数据点但是pandas抱怨内存不足。我能得到的最大值是44,000每块。但是,当我这样做并观察内存使用情况时,它只占我记忆的10%。我能不能知道为什么当它只使用10%的记忆时,大熊猫会抱怨内存不足。
错误
Traceback (most recent call last):
File "extract_msceleb2.py", line 71, in <module>
for chunk in pd.read_csv(file_name, sep = '\t', chunksize = num_imgs_per_thread, header=None):
File "/usr/local/lib/python2.7/dist-packages/pandas/io/common.py", line 113, in <lambda>
BaseIterator.next = lambda self: self.__next__()
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 916, in __next__
return self.get_chunk()
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 972, in get_chunk
return self.read(nrows=size)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 939, in read
ret = self._engine.read(nrows)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 1508, in read
data = self._reader.read(nrows)
File "pandas/parser.pyx", line 848, in pandas.parser.TextReader.read (pandas/parser.c:10415)
File "pandas/parser.pyx", line 882, in pandas.parser.TextReader._read_low_memory (pandas/parser.c:10896)
File "pandas/parser.pyx", line 924, in pandas.parser.TextReader._read_rows (pandas/parser.c:11437)
File "pandas/parser.pyx", line 911, in pandas.parser.TextReader._tokenize_rows (pandas/parser.c:11308)
File "pandas/parser.pyx", line 2024, in pandas.parser.raise_parser_error (pandas/parser.c:27037)
pandas.io.common.CParserError: Error tokenizing data. C error: out of memory
Segmentation fault (core dumped)
pandas是否会限制每个块的大小?