我想对带有chunsize的大数据(20 GB)进行简单的操作:
df = df.loc[df['ordinal'] == 1] # to choose all first points
df =
id_easy ordinal latitude longitude epoch day_of_week
0 aaa 1.0 22.0701 2.6685 01-01-11 07:45 Friday
1 aaa 2.0 22.0716 2.6695 01-01-11 07:45 Friday
2 aaa 3.0 22.0722 2.6696 01-01-11 07:46 Friday
3 bbb 1.0 22.1166 2.6898 01-01-11 07:58 Friday
我尝试:
n_rows = 1000000
reader = pd.read_csv("D:/...path.../file.csv", names=["id_easy","ordinal","epoch", "latitude", "longitude"],chunksize=n_rows)
for df in reader:
if not df.empty:
df = df.loc[df['ordinal'] == 1]
df.to_csv('test.csv', index=False, header=False, mode='a')
但是每次更改n_rows
时,保存的文件也会更改。我在哪里做错了?