根据条件拆分数据框

时间:2018-04-30 19:50:06

标签: python python-3.x pandas csv

我有以下数据框:

               0       1       2       3              4
0          1.JPG     NaN     NaN     NaN            NaN
1           2883  2957.0  3412.0  3340.0  miscellaneous
2           3517  3007.0  4062.0  3371.0  miscellaneous
3           5678  3158.0  6299.0  3423.0  miscellaneous
4           1627  3287.0  2149.0  3694.0  miscellaneous
5           2894  3272.0  3421.0  3664.0  miscellaneous
6           3525  3271.0  4064.0  3672.0  miscellaneous
7           4759  3337.0  5321.0  3640.0  miscellaneous
8           6141  3289.0  6664.0  3654.0  miscellaneous
9           1017  3598.0  1539.0  3979.0  miscellaneous
10          1624  3586.0  2155.0  3993.0  miscellaneous
11          2252  3612.0  2777.0  3967.0  miscellaneous
12          3211  3548.0  3735.0  3944.0  miscellaneous
13          6052  3616.0  6572.0  3983.0  miscellaneous
14           691  3911.0  1204.0  4223.0  miscellaneous
15         2.JPG     NaN     NaN     NaN            NaN
16         3.JPG     NaN     NaN     NaN            NaN
17          5384  2841.0  5963.0  3095.0  miscellaneous
18          5985  2797.0  6611.0  3080.0  miscellaneous
19          3512  3012.0  4025.0  3366.0  miscellaneous
20          5085  2974.0  5587.0  3367.0  miscellaneous
21          2593  3224.0  3148.0  3469.0  miscellaneous
22          1044  3630.0  1511.0  3928.0  miscellaneous
23          4764  3619.0  5283.0  3971.0  miscellaneous
24          5103  3613.0  5635.0  3928.0  miscellaneous

我想将这个数据框拆分成多个csv,这样:首先应该将csv命名为1.csv并且所有数据都低于1.jpg,依此类推。 例如,导出的CSV应该是:

1.csv

2883    2957    3412    3340    miscellaneous
3517    3007    4062    3371    miscellaneous
5678    3158    6299    3423    miscellaneous
1627    3287    2149    3694    miscellaneous
2894    3272    3421    3664    miscellaneous
3525    3271    4064    3672    miscellaneous
4759    3337    5321    3640    miscellaneous
6141    3289    6664    3654    miscellaneous
1017    3598    1539    3979    miscellaneous
1624    3586    2155    3993    miscellaneous
2252    3612    2777    3967    miscellaneous
3211    3548    3735    3944    miscellaneous
6052    3616    6572    3983    miscellaneous
691 3911    1204    4223    miscellaneous

2.csv(此csv应为空白)

3.csv

5384    2841    5963    3095    miscellaneous
5985    2797    6611    3080    miscellaneous
3512    3012    4025    3366    miscellaneous
5085    2974    5587    3367    miscellaneous
2593    3224    3148    3469    miscellaneous
1044    3630    1511    3928    miscellaneous
4764    3619    5283    3971    miscellaneous
5103    3613    5635    3928    miscellaneous

如何使用python和pandas执行此操作?

2 个答案:

答案 0 :(得分:2)

您可以使用:

for n,g in df.assign(grouper = df['0'].where(df['1'].isnull())
                                      .ffill().astype('category'))\
             .dropna().groupby('grouper'):
    g.drop('grouper', axis=1).to_csv(n+'.csv', header=None, index=False)

注意: 使用astype('类别')来提取没有记录的群组

输出!dir *.JPG.csv

04/30/2018  03:43 PM               657 1.JPG.csv
04/30/2018  03:43 PM                 0 2.JPG.csv
04/30/2018  03:43 PM               376 3.JPG.csv

列出1.jpg.csv的内容

2883,2957.0,3412.0,3340.0,miscellaneous
3517,3007.0,4062.0,3371.0,miscellaneous
5678,3158.0,6299.0,3423.0,miscellaneous
1627,3287.0,2149.0,3694.0,miscellaneous
2894,3272.0,3421.0,3664.0,miscellaneous
3525,3271.0,4064.0,3672.0,miscellaneous
4759,3337.0,5321.0,3640.0,miscellaneous
6141,3289.0,6664.0,3654.0,miscellaneous
1017,3598.0,1539.0,3979.0,miscellaneous
1624,3586.0,2155.0,3993.0,miscellaneous
2252,3612.0,2777.0,3967.0,miscellaneous
3211,3548.0,3735.0,3944.0,miscellaneous
6052,3616.0,6572.0,3983.0,miscellaneous
691,3911.0,1204.0,4223.0,miscellaneous

答案 1 :(得分:0)

# program splits one big csv file into individiual image csv 's 
import pandas as pd
import numpy as np
df = pd.read_csv('results.csv', header=None)
#df1 = df.replace(np.nan, '1', regex=True)

print(df)

for n,g in df.assign(grouper = df[0].where(df[1].isnull())
                                      .ffill().astype('category'))\
             .dropna().groupby('grouper'):
    g.drop('grouper', axis=1).to_csv(n+'.csv',float_format="%.0f", header=None, index=False )

这会产生所需的结果