熊猫数据框按列值的下一次出现进行分组

时间:2018-09-12 17:41:56

标签: python python-3.x pandas dataframe pandas-groupby

下面是我的数据框

     info        date       time      file            msg
0   INFO:  2018-09-12  16:10:10:  view.py:          phone   
1   INFO:  2018-09-12  16:10:10:  view.py:         asdasd   
2   INFO:  2018-09-12  16:10:43:  view.py:  contact start   
3   INFO:  2018-09-12  16:10:43:  view.py:    contact end   
4   INFO:  2018-09-12  16:11:36:  view.py:      app start   
5   INFO:  2018-09-12  16:11:36:  view.py:     busy start   
6   INFO:  2018-09-12  16:12:08:  view.py:       busy end   
7   INFO:  2018-09-12  16:12:08:  view.py:    contact end   
8   INFO:  2018-09-12  16:12:08:  view.py:        app end
9   INFO:  2018-09-12  16:12:08:  view.py:          phone
7   INFO:  2018-09-12  16:12:08:  view.py:    contact end

我想根据msg列中的值将此数据帧拆分为多个数据帧。 如果要按“电话”作为值分割,我的数据框应该看起来像这样:

df1:

     info        date       time      file            msg
0   INFO:  2018-09-12  16:10:10:  view.py:          phone   
1   INFO:  2018-09-12  16:10:10:  view.py:         asdasd   
2   INFO:  2018-09-12  16:10:43:  view.py:  contact start   
3   INFO:  2018-09-12  16:10:43:  view.py:    contact end   
4   INFO:  2018-09-12  16:11:36:  view.py:      app start   
5   INFO:  2018-09-12  16:11:36:  view.py:     busy start   
6   INFO:  2018-09-12  16:12:08:  view.py:       busy end   
7   INFO:  2018-09-12  16:12:08:  view.py:    contact end   
8   INFO:  2018-09-12  16:12:08:  view.py:        app end

df2:

 info        date       time      file            msg
9   INFO:  2018-09-12  16:12:08:  view.py:          phone
7   INFO:  2018-09-12  16:12:08:  view.py:    contact end

1 个答案:

答案 0 :(得分:0)

为可变数量的相关变量使用字典。在这里,您可以结合使用if(currentChar >= 48 && currentChar <= 57 ) { + GroupBy

cumsum

然后通过d = dict(tuple(df.groupby(df['msg'].eq('phone').cumsum()))) d[1],...,d[2]访问数据框。

结果:

d[n]