我有以下数据框
count
2015-09-28 2
2015-09-29 2
2015-09-30 0
2015-10-01 2
2015-10-02 3
2015-10-05 2
2015-10-06 1
2015-10-07 0
2015-10-08 1
我想按日期与count==0
分隔的数据范围进行分组。我想得到这样的东西
first last totalcount
1 2015-09-28 2015-09-29 4
2 2015-10-01 2015-10-06 8
3 2015-10-08 2015-10-08 1
答案 0 :(得分:3)
使用cumsum
将每一行与一个组号相关联:
In [134]: df['groupno'] = (df['count'] == 0).cumsum()
In [135]: df
Out[135]:
count groupno
2015-09-28 2 0
2015-09-29 2 0
2015-09-30 0 1
2015-10-01 2 1
2015-10-02 3 1
2015-10-05 2 1
2015-10-06 1 1
2015-10-07 0 2
2015-10-08 1 2
然后您可以使用groupby/agg
来获得所需的结果:
import pandas as pd
df = pd.DataFrame({'count': [2, 2, 0, 2, 3, 2, 1, 0, 1]},
index=[u'2015-09-28', u'2015-09-29', u'2015-09-30', u'2015-10-01',
u'2015-10-02', u'2015-10-05', u'2015-10-06', u'2015-10-07',
u'2015-10-08'])
mask = (df['count'] == 0)
df['groupno'] = mask.cumsum()
# Remove the rows where the count is 0
df = df.loc[~mask]
# Make the index a column so we can use 'index':['first', 'last'] to find the
# first and last date in each group.
df = df.reset_index()
result = df.groupby('groupno').agg({'index':['first', 'last'], 'count':'sum'})
result.columns = result.columns.droplevel(0)
result = result.rename(columns={'sum':'totalcount'})
产量
totalcount first last
groupno
0 4 2015-09-28 2015-09-29
1 8 2015-10-01 2015-10-06
2 1 2015-10-08 2015-10-08