我正在尝试使用Amazon EC2 Free Tier运行Python脚本。 输入文件大小为4 GB。 尝试使用pandas read_csv将其读入数据框时,出现以下错误。
我尝试同时使用chunksize和low_memory选项来解决此问题,但每个选项仍然出现类似的错误:
train = pd.DataFrame()
chunks = []
for chunk in pd.read_csv('train.csv', chunksize=1000, low_memory=False):
chunks.append(chunk)
train = pd.concat(chunks, axis=0)
错误说明:
Traceback (most recent call last):
File "imports.py", line 54, in <module>
for chunk in pd.read_csv('../data/train.csv', chunksize=1000, low_memory=False):
File "/home/ec2-user/.local/lib/python3.7/site-packages/pandas/io/parsers.py", line 1007, in __next__
return self.get_chunk()
File "/home/ec2-user/.local/lib/python3.7/site-packages/pandas/io/parsers.py", line 1070, in get_chunk
return self.read(nrows=size)
File "/home/ec2-user/.local/lib/python3.7/site-packages/pandas/io/parsers.py", line 1036, in read
ret = self._engine.read(nrows)
File "/home/ec2-user/.local/lib/python3.7/site-packages/pandas/io/parsers.py", line 1848, in read
data = self._reader.read(nrows)
File "pandas/_libs/parsers.pyx", line 879, in pandas._libs.parsers.TextReader.read
File "pandas/_libs/parsers.pyx", line 945, in pandas._libs.parsers.TextReader._read_rows
File "pandas/_libs/parsers.pyx", line 932, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas/_libs/parsers.pyx", line 2112, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: out of memory
在脚本或EC2上是否可以应用任何机制来解决此问题?