熊猫块大小以不同的方式工作

时间:2019-12-04 01:45:49

标签: python pandas numpy chunks

我想对带有chunsize的大数据(20 GB)进行简单的操作:

df = df.loc[df['ordinal'] == 1] # to choose all first points

df = 

id_easy ordinal latitude    longitude            epoch  day_of_week
0   aaa     1.0  22.0701       2.6685   01-01-11 07:45       Friday
1   aaa     2.0  22.0716       2.6695   01-01-11 07:45       Friday
2   aaa     3.0  22.0722       2.6696   01-01-11 07:46       Friday
3   bbb     1.0  22.1166       2.6898   01-01-11 07:58       Friday

我尝试:

n_rows = 1000000
reader = pd.read_csv("D:/...path.../file.csv", names=["id_easy","ordinal","epoch", "latitude", "longitude"],chunksize=n_rows)

for df in reader: 
    if not df.empty:
        df = df.loc[df['ordinal'] == 1] 
        df.to_csv('test.csv', index=False, header=False, mode='a')

但是每次更改n_rows时,保存的文件也会更改。我在哪里做错了?

0 个答案:

没有答案