按特定列的值范围聚合pandas数据帧

时间:2017-04-28 17:23:23

标签: python dataframe aggregation

数据如下:

time  value
1995  1
1995  2
1997  3
1998  5
2005  4
2004  7

如何更换年份范围:

time       value
1995-2000   1
1995-2000   2
1995-2000   3
1995-2000   5
2000-2005   4
2000-2005   7

THX

1 个答案:

答案 0 :(得分:1)

来源DF

In [222]: x
Out[222]:
   time  value
0  1995      1
1  1995      2
2  1997      3
3  1998      5
4  2005      4
5  2004      7

binning time列:

In [223]: x.time = pd.cut(x.time, bins=np.arange(1995, 2005+5, step=5), include_lowest=True)

In [224]: x
Out[224]:
           time  value
0  [1995, 2000]      1
1  [1995, 2000]      2
2  [1995, 2000]      3
3  [1995, 2000]      5
4  (2000, 2005]      4
5  (2000, 2005]      7

凝集:

In [221]: x.groupby(pd.cut(x.time, bins=np.arange(1995, 2005+5, step=5),
                           include_lowest=True))['value'] \
           .sum()
Out[221]:
time
[1995, 2000]    11
(2000, 2005]    11
Name: value, dtype: int64