我知道有很多有关读取csv文件的熊猫和大块的话题,但是我仍然难以读取巨大的csv文件。
chunksize = 10 ** 6
data = {}
count = 0
project_IDs = set()
count_analysed = 0
for chunk in pd.read_csv(path_source, iterator=True, chunksize=chunksize, header=None, delimiter=",", usecols=[0,1], names=["project_id","commit_id"]):
for row in list(chunk.values):
count_analysed += 1
if str(row[0]) in project_IDs:
data[str(row[1])] = 0
cnt += 1
该想法是将数据临时存储在集中以提高性能
我的电脑有16GB的RAM。使用SSD。
我总是收到内存错误。知道我可以做什么来读取此csv文件吗?