组数据范围按列值不为零

时间:2015-10-10 21:42:54

标签: python pandas

我有以下数据框

            count
2015-09-28      2
2015-09-29      2
2015-09-30      0
2015-10-01      2
2015-10-02      3
2015-10-05      2
2015-10-06      1
2015-10-07      0
2015-10-08      1

我想按日期与count==0分隔的数据范围进行分组。我想得到这样的东西

  first      last       totalcount
1 2015-09-28 2015-09-29 4
2 2015-10-01 2015-10-06 8
3 2015-10-08 2015-10-08 1

1 个答案:

答案 0 :(得分:3)

使用cumsum将每一行与一个组号相关联:

In [134]: df['groupno'] = (df['count'] == 0).cumsum()

In [135]: df
Out[135]: 
            count  groupno
2015-09-28      2        0
2015-09-29      2        0
2015-09-30      0        1
2015-10-01      2        1
2015-10-02      3        1
2015-10-05      2        1
2015-10-06      1        1
2015-10-07      0        2
2015-10-08      1        2

然后您可以使用groupby/agg来获得所需的结果:

import pandas as pd
df = pd.DataFrame({'count': [2, 2, 0, 2, 3, 2, 1, 0, 1]},
                  index=[u'2015-09-28', u'2015-09-29', u'2015-09-30', u'2015-10-01',
                         u'2015-10-02', u'2015-10-05', u'2015-10-06', u'2015-10-07',
                         u'2015-10-08'])


mask = (df['count'] == 0)
df['groupno'] = mask.cumsum()
# Remove the rows where the count is 0
df = df.loc[~mask]
# Make the index a column so we can use 'index':['first', 'last'] to find the
# first and last date in each group.
df = df.reset_index()
result = df.groupby('groupno').agg({'index':['first', 'last'], 'count':'sum'})
result.columns = result.columns.droplevel(0)
result = result.rename(columns={'sum':'totalcount'})

产量

         totalcount       first        last
groupno                                    
0                 4  2015-09-28  2015-09-29
1                 8  2015-10-01  2015-10-06
2                 1  2015-10-08  2015-10-08