在进行每日重采样并取平均值之前,以及最终取30天的滚动平均值之前,我具有以下熊猫数据框:
import pandas as pd
df = pd.DataFrame()
df.index = ['2009-01-04', '2009-01-05', '2009-01-05', '2009-01-06', '2009-01-06', '2009-01-07', '2009-01-07', '2009-01-07']
df['score1'] = [84, 28, 38, 48, 23, 38, 22, 37]
df['score2'] = [83, 43, 12, 93, 64, 28, 29, 12]
df['score3'] = [92, 33, 11, 48, 23, 22, 12, 38]
df['score4'] = [43, 23, 41, 75, 93, 93, 23, 21]
df['condition1'] = [0, 0, 1, 0, 1, 0, 1, 0]
df['condition2'] = [1, 0, 1, 0, 0, 0, 0, 1]
df['condition3'] = [0, 0, 0, 1, 1, 0, 0, 1]
df = df.resample('D', how='mean')
df = df.rolling(30, min_periods=1).mean()
在这种情况下,我想做一个“条件平均”-即。只要这三个条件之一等于== 1,就只会计算具有1的“行”的均值。 例如,在满足条件1和条件3的时间3,我们只对[2009-01-05]做[38、12、11和41]的平均值,而忽略了[28、43、33、23]。 / p>
答案 0 :(得分:0)
# convert your index to datetime:
df.index = pd.to_datetime(df.index)
# select the rows that meet condition:
df = df[df.loc[:,['condition1','condition2','condition3']].sum(axis=1)>0]
# resample
df = df.resample('D').mean() # updated syntax
>>> print (df)
score1 score2 score3 score4 condition1 condition2 condition3
2009-01-04 84.0 83.0 92.0 43.0 0.0 1.0 0.0
2009-01-05 38.0 12.0 11.0 41.0 1.0 1.0 0.0
2009-01-06 35.5 78.5 35.5 84.0 0.5 0.0 1.0
2009-01-07 29.5 20.5 25.0 22.0 0.5 0.5 0.5