我是Pandas的新手,在这里做什么有点迷失。我有一个从csv导入的数据帧,(大大简化)看起来像这样:
date = ['2013-08-10','2013-08-10','2013-08-10','2013-08-10','2013-08-10',
'2013-08-10','2013-08-10','2013-08-10','2013-08-10','2013-08-10']
event = ['213','213','213','213','214','214','214','215','215','215']
side = ['A','B','B','B','A','B','A','B','A','B',]
value = [0.193,0.193,0.092,0.027,0.027,0.058,0.027,0.079,0.193,0.159]
df = pd.DataFrame(zip(event,date,side,value),
columns=['event','date','side','value'])
event date side value
0 213 2013-08-10 A 0.193
1 213 2013-08-10 B 0.193
2 213 2013-08-10 B 0.092
3 213 2013-08-10 B 0.027
4 214 2013-08-10 A 0.027
5 214 2013-08-10 B 0.058
6 214 2013-08-10 A 0.027
7 215 2013-08-10 B 0.079
8 215 2013-08-10 A 0.193
9 215 2013-08-10 B 0.159
我想要的是对每个事件的每一侧的值相加。这是我用groupby实现的:
groupby = df.groupby(['event','side']).sum()
value
event side
213 A 0.193
B 0.312
214 A 0.054
B 0.058
215 A 0.193
B 0.238
但我还想添加一个新的列,每个方面的扩展均值,如下所示:
value
event side roll_mean
213 A 0.193 0
B 0.312 0
214 A 0.054 0.193
B 0.058 0.312
215 A 0.193 0.124
B 0.238 0.185
请注意,每个事件都有两个方面,但它并不总是A和B.我想要的是像excel的mean.if函数,它计算当前所有值的扩展均值侧面,适用于所有以前的行。任何有关这方面的帮助将不胜感激。
答案 0 :(得分:2)
我认为你实际上在寻找一个不断扩大的意思,而不是一个滚动的意思。扩展的平均值考虑每个先前的值。我将从你离开的地方开始:
In [63]: res = df.groupby(['event','side']).sum()
In [64]: res
Out[64]:
value
event side
213 A 0.193
B 0.312
214 A 0.054
B 0.058
215 A 0.193
B 0.238
现在我们想要side
分组并采取扩展的意思:
In [65]: res['expanding_mean'] = res.groupby(level='side').apply(pd.expanding_mean).shift(2)
In [66]: res
Out[66]:
value expanding_mean
event side
213 A 0.193 NaN
B 0.312 NaN
214 A 0.054 0.1930
B 0.058 0.3120
215 A 0.193 0.1235
B 0.238 0.1850
您的结果需要shift
2,因为您希望平均值包含所有之前的,而不是当前的(确保这是您真正想要的,这个看起来有点好笑)。您可以将shift(2)
替换为len(res.index.levels[1])
,以便在您拥有超过2个边时更加通用。
答案 1 :(得分:0)
我为您的数据框添加了更多“边”,因此当结果不仅仅是“A”或“B”时它才有用。这是你想要的吗?
import pandas as pd
import numpy as np
date = ['2013-08-10','2013-08-10','2013-08-10','2013-08-10','2013-08-10',
'2013-08-10','2013-08-10','2013-08-10','2013-08-10','2013-08-10']
event = ['213','213','213','213','214','214','214','215','215','215']
side = ['A','B','A','B','C','A','C','A','C','A',]
value = [0.193,0.193,0.092,0.027,0.027,0.058,0.027,0.079,0.193,0.159]
df = pd.DataFrame(list(zip(event,date,side,value)),
columns=['event','date','side','value'])
print(df)
event date side value
0 213 2013-08-10 A 0.193
1 213 2013-08-10 B 0.193
2 213 2013-08-10 A 0.092
3 213 2013-08-10 B 0.027
4 214 2013-08-10 C 0.027
5 214 2013-08-10 A 0.058
6 214 2013-08-10 C 0.027
7 215 2013-08-10 A 0.079
8 215 2013-08-10 C 0.193
9 215 2013-08-10 A 0.159
ds = df.groupby(['event','side']).sum()
print(ds)
value
event side
213 A 0.285
B 0.220
214 A 0.058
C 0.054
215 A 0.238
C 0.193
ds.reset_index(inplace=True)
ds['exp_mean'] = np.NaN
for s in ds.side.unique():
ndx = ds[ds.side==s].index
ds.ix[ndx,'exp_mean'] = pd.expanding_mean(ds.ix[ndx,'value']).shift(1)
ds.set_index(['event', 'side'], inplace=True, drop=True)
print(ds)
value exp_mean
event side
213 A 0.285 NaN
B 0.220 NaN
214 A 0.058 0.2850
C 0.054 NaN
215 A 0.238 0.1715
C 0.193 0.0540
答案 2 :(得分:0)
查看此熊猫提交(第60-78行):https://github.com/pandas-dev/pandas/commit/699424027fb657192541bcd0c3d9f9b7d26f2300
`You can now use ``.rolling(..)`` and ``.expanding(..)`` as methods on groupbys.
These return another deferred object (similar to what ``.rolling()`` and
``.expanding()`` do on ungrouped pandas objects). You can then operate
on these ``RollingGroupby`` objects in a similar manner.
Previously you would have to do this to get a rolling window mean per-group:
.. ipython:: python
df = pd.DataFrame({'A': [1] * 20 + [2] * 12 + [3] * 8,
'B': np.arange(40)})
df
.. ipython:: python
df.groupby('A').apply(lambda x: x.rolling(4).B.mean())
Now you can do:
.. ipython:: python
df.groupby('A').rolling(4).B.mean()`