我需要根据连续几天的措施总和进行一些计算。例如:
import pandas as pd
from pandas import Series
rng = pd.date_range('1/3/2000', periods=8)
rng = rng[:4].append(rng[5:])
ts = Series(randn(7).astype('int'), index=rng)
ts
Out[1]:
2000-01-03 0
2000-01-04 0
2000-01-05 0
2000-01-06 -1
2000-01-08 0
2000-01-09 -2
2000-01-10 -1
dtype: int64
我怎么能在这里总结连续日值,所以我会得到这样的东西?
Out[2]:
2000-01-03 -1
2000-01-04 -1
2000-01-05 -1
2000-01-06 -1
2000-01-08 -3
2000-01-09 -3
2000-01-10 -3
dtype: int64
答案 0 :(得分:1)
现在我找到了答案,问题似乎更简单了:
def ranks(series):
"""
In an ORDERED series, this function identifies consecutive days
giving each group an unique number identifier. Argument must be
a pandas Series with datetime index.
"""
td = series.index.to_series().diff()
td[0] = timedelta64(1, 'D')
res = []
counter = 0
for i in range(td.size):
if td[i] > timedelta64(1, 'D'):
counter += 1
res.append(counter)
return(Series(res, index=series.index))
从这里开始,大熊猫groupby会照顾它。 ; - )
df = DataFrame({'val':ts, 'gr':ranks(ts)})
gr = DataFrame({'val':ts, 'gr':ranks(ts)}).groupby('gr')
df.merge(gr.sum(), left_on='gr', right_index=True, how='outer')