按唯一列拆分CSV

时间:2019-09-14 08:59:47

标签: python pandas numpy csv pandas-groupby

我在尝试将CS​​V分割为CSV文件的最小值时遇到问题,因此每个文件中都只有唯一的ID

通过运行

count = df['id'].value_counts().max()

我已经知道应该创建的csv文件数量(file1,file2,file3,file4)

我的预期结果应该是

file1

 person_name     id    Total  Paid        Date          No
      Deniss  55227  1191,75  0,00  21/08/2019  15/06/2018
      RINALDS  56002   169,00  0,00  21/08/2019  15/06/2018
      OLGA  54689   812,90  0,00  21/08/2019  15/05/2018

file2

person_name     id    Total  Paid        Date          No
Deniss  55227  1191,75  0,00  21/08/2019    20180615
RINALDS  56002   169,00  0,00  21/08/2019    20180615
OLGA  54689   812,90  0,00  21/08/2019    20180515

file3

person_name     id    Total  Paid        Date          No
Deniss  55227  1191,75  0,00  21/08/2019    20180613
RINALDS  56002   169,00  0,00  21/08/2019    20180614

file4

person_name     id    Total  Paid        Date          No
Deniss  55227  1191,75  0,00  21/08/2019    20180612


1 个答案:

答案 0 :(得分:1)

GroupBy.cumcount用于计数器系列,然后循环写入文件:

g = df.groupby('id').cumcount() + 1

for i, df in df.groupby(g):
    df.to_csv(f'file{i}.csv', index=False)

使用示例数据进行测试:

for i, df in df.groupby(g):
    print (df)

      person_name     id    Total  Paid        Date          No
    0      Deniss  55227  1191,75  0,00  21/08/2019  15/06/2018
    4     RINALDS  56002   169,00  0,00  21/08/2019  15/06/2018
    7        OLGA  54689   812,90  0,00  21/08/2019  15/05/2018
      person_name     id    Total  Paid        Date        No
    1      Deniss  55227  1191,75  0,00  21/08/2019  20180615
    5     RINALDS  56002   169,00  0,00  21/08/2019  20180615
    8        OLGA  54689   812,90  0,00  21/08/2019  20180515
      person_name     id    Total  Paid        Date        No
    2      Deniss  55227  1191,75  0,00  21/08/2019  20180613
    6     RINALDS  56002   169,00  0,00  21/08/2019  20180614
      person_name     id    Total  Paid        Date        No
    3      Deniss  55227  1191,75  0,00  21/08/2019  20180612