Question

我在尝试将CSV分割为CSV文件的最小值时遇到问题，因此每个文件中都只有唯一的ID

通过运行

count = df['id'].value_counts().max()

我已经知道应该创建的csv文件数量（file1，file2，file3，file4）

我的预期结果应该是

file1

 person_name     id    Total  Paid        Date          No
      Deniss  55227  1191,75  0,00  21/08/2019  15/06/2018
      RINALDS  56002   169,00  0,00  21/08/2019  15/06/2018
      OLGA  54689   812,90  0,00  21/08/2019  15/05/2018

file2

person_name     id    Total  Paid        Date          No
Deniss  55227  1191,75  0,00  21/08/2019    20180615
RINALDS  56002   169,00  0,00  21/08/2019    20180615
OLGA  54689   812,90  0,00  21/08/2019    20180515

file3

person_name     id    Total  Paid        Date          No
Deniss  55227  1191,75  0,00  21/08/2019    20180613
RINALDS  56002   169,00  0,00  21/08/2019    20180614

file4

person_name     id    Total  Paid        Date          No
Deniss  55227  1191,75  0,00  21/08/2019    20180612

Answer 1

将GroupBy.cumcount用于计数器系列，然后循环写入文件：

g = df.groupby('id').cumcount() + 1

for i, df in df.groupby(g):
    df.to_csv(f'file{i}.csv', index=False)

使用示例数据进行测试：

for i, df in df.groupby(g):
    print (df)

      person_name     id    Total  Paid        Date          No
    0      Deniss  55227  1191,75  0,00  21/08/2019  15/06/2018
    4     RINALDS  56002   169,00  0,00  21/08/2019  15/06/2018
    7        OLGA  54689   812,90  0,00  21/08/2019  15/05/2018
      person_name     id    Total  Paid        Date        No
    1      Deniss  55227  1191,75  0,00  21/08/2019  20180615
    5     RINALDS  56002   169,00  0,00  21/08/2019  20180615
    8        OLGA  54689   812,90  0,00  21/08/2019  20180515
      person_name     id    Total  Paid        Date        No
    2      Deniss  55227  1191,75  0,00  21/08/2019  20180613
    6     RINALDS  56002   169,00  0,00  21/08/2019  20180614
      person_name     id    Total  Paid        Date        No
    3      Deniss  55227  1191,75  0,00  21/08/2019  20180612

按唯一列拆分CSV

1 个答案: