我有以下Pandas DataFrame df:
Value
time Position
1493791210867023000 0.0 21156.0
1.0 1230225.0
2.0 1628088.0
3.0 2582359.0
4.0 3388164.0
1493791210880251000 0.0 21156.0
1.0 1230225.0
2.0 1628088.0
3.0 2582359.0
4.0 3388164.0
1493791210888418000 0.0 21156.0
1.0 1230225.0
... ... ...
如何有效地总结指数"位置"? 我试图实现的确切求和公式是:
Value Result
time Position
1493791210867023000 0.0 21156.0 Sum from 0.0 to 0.0
1.0 1230225.0 Sum from 0.0 to 1.0
2.0 1628088.0 Sum from 0.0 to 2.0
3.0 2582359.0 Sum from 0.0 to 3.0
4.0 3388164.0 Sum from 0.0 to 4.0
1493791210880251000 0.0 21156.0 Sum from 0.0 to 0.0
1.0 1230225.0 Sum from 0.0 to 1.0
2.0 1628088.0 Sum from 0.0 to 2.0
3.0 2582359.0 Sum from 0.0 to 3.0
... ... ... ...
我当前的解决方案需要太长时间(IndexSlice非常缓慢)而且我不太确定,我如何能够有效地将总和结果排序到(新创建的)"结果"列?
import pandas as pd
import numpy as np
idx = pd.IndexSlice
res = {}
for i in range(5):
res[i] = df.loc[idx[:, :i]].groupby(level="time").sum()
df["Result"] = 0 #fill Result now with res[i] depending on position
答案 0 :(得分:4)
尝试在cumsum
groupby
df.assign(Result=df.groupby(level='time').Value.cumsum())
# suggested by @ScottBoston for pandas 0.20.1+
# df.assign(Result=df.groupby('time').Value.cumsum())
Value Result
time Position
1493791210867023000 0.0 21156.0 21156.0
1.0 1230225.0 1251381.0
2.0 1628088.0 2879469.0
3.0 2582359.0 5461828.0
4.0 3388164.0 8849992.0
1493791210880251000 0.0 21156.0 21156.0
1.0 1230225.0 1251381.0
2.0 1628088.0 2879469.0
3.0 2582359.0 5461828.0
4.0 3388164.0 8849992.0
1493791210888418000 0.0 21156.0 21156.0
1.0 1230225.0 1251381.0