我有2gb的csv文件要加载到python中,然后将它们连接起来
concat之后会抛出内存错误 任何人都可以帮我解决这个问题,因为我必须经常使用这个数据帧
答案 0 :(得分:0)
抱歉,我还不能在评论部分回复,但要使用pandas从csv中读取
import pandas as pd
csv_data = pd.read_csv("csv_name.csv")
答案 1 :(得分:0)
import csv
reader = csv.reader(open('csv_name.csv'))
def gen_chunks(reader, chunksize=100):
"""
Chunk generator. Take a CSV reader and yield
chunksize sized slices.
"""
chunk = []
for i, line in enumerate(reader):
if (i % chunksize == 0 and i > 0):
yield chunk
del chunk[:]
chunk.append(line)
yield chunk
for chunk in gen_chunks(reader):
print (chunk) # process chunk
# test gen_chunk on some dummy sequence:
for chunk in gen_chunks(range(10), chunksize=3):
print (chunk) # process chunk
答案 2 :(得分:0)
遇到类似问题并找到了另一种解决方案,你可以使用库" dask"
例如:
# Dataframes implement the Pandas API
import dask.dataframe as dd`<br>
df = dd.read_csv('s3://.../2018-*-*.csv')