我有这个人。带有日期时间索引的pandas数据框:
datetime VAL
2000-01-01 -283.0000
2000-01-02 -283.0000
2000-01-03 -10.6710
2000-01-04 -12.2700
2000-01-05 -10.7855
2001-01-06 -9.1480
2001-01-07 -9.5300
2001-01-08 -10.4675
2001-01-09 -10.9205
2001-01-10 -11.5715
我想计算每年的累积值,并用累积值替换VAL列。例如,它看起来像这样:
datetime VAL
2000-01-01 -283.0000
2000-01-02 -283.0000 + -283.0000
2000-01-03 -10.6710 + -283.0000 + -283.0000
2000-01-04 -12.2700 + -10.6710 + -283.0000 + -283.0000
2000-01-05 -10.7855 + -12.2700 + -10.6710 + -283.0000 + -283.0000
2001-01-06 -9.1480
2001-01-07 -9.5300 + -9.5300
2001-01-08 -10.4675 + -10.4675
2001-01-09 -10.9205 + -10.9205
2001-01-10 -11.5715 + -11.5715
我还没有完成实际计算,这就是为什么你看到-283.000 + -283.000而不是-566.0000
不确定如何继续这个,我可以做一个groupby然后呢?
答案 0 :(得分:3)
您可以在DateTimeIndex上通过.year
访问年份,并将其传递给groupby
:
>>> df["cumulative_VAL"] = df.groupby(df.index.year)["VAL"].cumsum()
>>> df
VAL cumulative_VAL
datetime
2000-01-01 -283.0000 -283.0000
2000-01-02 -283.0000 -566.0000
2000-01-03 -10.6710 -576.6710
2000-01-04 -12.2700 -588.9410
2000-01-05 -10.7855 -599.7265
2001-01-06 -9.1480 -9.1480
2001-01-07 -9.5300 -18.6780
2001-01-08 -10.4675 -29.1455
2001-01-09 -10.9205 -40.0660
2001-01-10 -11.5715 -51.6375
答案 1 :(得分:1)
使用numpy.cumsum()
>>> a = np.array([[1,2,3], [4,5,6]])
>>> a array([[1, 2, 3],
[4, 5, 6]])
>>> np.cumsum(a) array([ 1, 3, 6, 10, 15, 21])
>>> np.cumsum(a, dtype=float) # specifies type of output value(s) array([ 1., 3., 6., 10., 15., 21.])
http://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.cumsum.html
到groupby
年,您可以使用:
data.groupby(data['datetime'].map(lambda x: x.year))
How to group pandas DataFrame entries by date in a non-unique column