根据条件解析pandas数据帧并将其保存到csv

时间:2014-08-25 23:22:14

标签: python csv pandas

所以我有一个来自(csv很长)的数据框我想基于条件解析它(每次D列返回零创建一个新的数据帧名称数据#)在其中一个列上并保存他们去了一个新的csv文件。我知道在python我可以做mp.mlab.find(mp.logical_and(D == 0.0))它会给我D = 0的索引,但我不知道如何识别它的开头序列及其结束并将其解析为保留所有列的新数据帧。 data A B C D E F H 0 0 12.000000 -8.000000 0.0 1 1 2 1 1 12.000000 -8.000000 0.0 1 1 1 2 2 12.100000 -8.100000 0.0 1 1 0 3 3 12.020000 -8.000000 0.0 1 1 1 4 4 12.010000 -8.000000 1.2 1 1 0 5 5 12.000000 -8.000000 1.3 1 1 2 6 6 1.500000 -8.200000 1.9 1 1 1 7 7 12.600000 -8.000100 2.0 1 1 1 8 8 12.400000 -8.000000 3.5 1 1 2 9 9 12.200000 -8.050036 6.0 1 1 -1 10 10 12.020000 -8.053374 7.8 1 1 2 11 11 12.000000 -8.056713 9.0 1 1 1 12 12 12.000000 -8.060051 12.5 1 1 1 13 13 1.500000 -8.063389 18.0 1 1 1 14 14 12.600000 -8.066728 19.0 1 1 -1 15 15 12.000000 -8.070066 15.0 1 1 2 16 16 12.400000 -8.073404 16.0 1 1 1 17 17 12.300000 -8.076743 10.0 1 1 0 18 18 12.000000 -8.080081 5.0 1 1 2 19 19 12.300000 -8.083419 4.5 1 1 0 20 20 12.600000 -8.086758 1.2 1 1 1 21 21 12.000000 -8.090096 0.0 1 1 1 22 22 12.000000 -8.093434 0.0 1 1 0 23 23 12.000000 -8.096773 0.0 1 1 1 24 24 12.200000 -8.100111 1.5 1 1 2 25 25 12.200000 -8.103449 3.0 1 1 2 26 26 12.300000 -8.106788 7.0 1 1 0 27 27 12.500000 -8.110126 5.0 1 1 2 28 28 12.000000 -8.113464 2.0 1 1 -1 29 29 12.300000 -8.116803 0.0 1 1 1 30 30 12.400000 -8.120141 0.0 1 1 1 31 31 12.600000 -8.123479 0.0 1 1 -1 32 32 12.500000 -8.126818 0.0 1 1 -1 33 33 12.000000 -8.130156 0.8 1 1 1 34 34 12.360000 -8.133494 1.6 1 1 -1 35 35 12.370909 -8.136833 2.0 1 1 2 36 36 12.381818 -8.140171 5.0 1 1 1 37 37 12.392727 -8.143509 4.0 1 1 0 38 38 12.403636 -8.146848 3.0 1 1 0 39 39 12.414545 -8.150186 2.6 1 1 1 40 40 12.425455 -8.153524 1.2 1 1 2 41 41 12.436364 -8.156863 0.0 1 1 1 42 42 12.447273 -8.160201 0.0 1 1 1 43 43 12.458182 -8.163539 0.0 1 1 -1 44 44 12.469091 -8.166878 0.0 1 1 0 45 45 12.480000 -8.170216 0.0 1 1 1 46 46 12.490909 -8.173554 2.5 1 1 2 47 47 12.501818 -8.176893 3.0 1 1 -1 48 48 12.512727 -8.180231 7.0 1 1 -1 49 49 12.523636 -8.183569 9.0 1 1 -1 50 50 12.534545 -8.186908 15.0 1 1 0 51 51 12.545455 -8.190246 26.0 1 1 -1 52 52 12.556364 -8.193584 9.0 1 1 0 53 53 12.567273 -8.196923 7.0 1 1 -1 54 54 12.578182 -8.200261 6.0 1 1 0 55 55 12.589091 -8.203599 4.3 1 1 1 56 56 12.600000 -8.206938 3.3 1 1 2 57 57 12.610909 -8.210276 2.3 1 1 0 58 58 12.621818 -8.213614 2.1 1 1 -1 59 59 12.632727 -8.216953 0.9 1 1 -1

我想得到像

这样的东西

data1 A B C D E F H 0 0 12.000000 -8.000000 0.0 1 1 2 1 1 12.000000 -8.000000 0.0 1 1 1 2 2 12.100000 -8.100000 0.0 1 1 0 3 3 12.020000 -8.000000 0.0 1 1 1 4 4 12.010000 -8.000000 1.2 1 1 0 5 5 12.000000 -8.000000 1.3 1 1 2 6 6 1.500000 -8.200000 1.9 1 1 1 7 7 12.600000 -8.000100 2.0 1 1 1 8 8 12.400000 -8.000000 3.5 1 1 2 9 9 12.200000 -8.050036 6.0 1 1 -1 10 10 12.020000 -8.053374 7.8 1 1 2 11 11 12.000000 -8.056713 9.0 1 1 1 12 12 12.000000 -8.060051 12.5 1 1 1 13 13 1.500000 -8.063389 18.0 1 1 1 14 14 12.600000 -8.066728 19.0 1 1 -1 15 15 12.000000 -8.070066 15.0 1 1 2 16 16 12.400000 -8.073404 16.0 1 1 1 17 17 12.300000 -8.076743 10.0 1 1 0 18 18 12.000000 -8.080081 5.0 1 1 2 19 19 12.300000 -8.083419 4.5 1 1 0 20 20 12.600000 -8.086758 1.2 1 1 1 21 21 12.000000 -8.090096 0.0 1 1 1

data2 22 22 12.000000 -8.093434 0.0 1 1 0 23 23 12.000000 -8.096773 0.0 1 1 1 24 24 12.200000 -8.100111 1.5 1 1 2 25 25 12.200000 -8.103449 3.0 1 1 2 26 26 12.300000 -8.106788 7.0 1 1 0 27 27 12.500000 -8.110126 5.0 1 1 2 28 28 12.000000 -8.113464 2.0 1 1 -1 29 29 12.300000 -8.116803 0.0 1 1 1 30 30 12.400000 -8.120141 0.0 1 1 1 31 31 12.600000 -8.123479 0.0 1 1 -1 32 32 12.500000 -8.126818 0.0 1 1 -1 33 33 12.000000 -8.130156 0.8 1 1 1 34 34 12.360000 -8.133494 1.6 1 1 -1 35 35 12.370909 -8.136833 2.0 1 1 2 36 36 12.381818 -8.140171 5.0 1 1 1 37 37 12.392727 -8.143509 4.0 1 1 0 38 38 12.403636 -8.146848 3.0 1 1 0 39 39 12.414545 -8.150186 2.6 1 1 1 40 40 12.425455 -8.153524 1.2 1 1 2 41 41 12.436364 -8.156863 0.0 1 1 1

data3 ... ... ...

任何帮助将不胜感激。 谢谢!

1 个答案:

答案 0 :(得分:1)

找到D已经回到0的行可以这样做:

df['back_to_0'] = (df['D'] == 0) & (df['D'].diff() < 0)

即。值为0,自上一行以来该值已经下降。

然后我们创建一个分组变量:

df['time_group'] = df['back_to_0'].cumsum() + 1

我们可以按time_group拆分以获取单个数据框:

grouped = df.groupby('time_group')

for group_number, group_data in grouped:
    print(group_data.head())
    # Could also do group_data.to_csv() here if you want
    # to save the individual pieces as separate files