熊猫的条件平均值

时间:2020-01-26 19:47:34

标签: python pandas dataframe

在进行每日重采样并取平均值之前,以及最终取30天的滚动平均值之前,我具有以下熊猫数据框:

import pandas as pd

df = pd.DataFrame()
df.index = ['2009-01-04', '2009-01-05', '2009-01-05', '2009-01-06', '2009-01-06', '2009-01-07', '2009-01-07', '2009-01-07']
df['score1'] = [84, 28, 38, 48, 23, 38, 22, 37]
df['score2'] = [83, 43, 12, 93, 64, 28, 29, 12]
df['score3'] = [92, 33, 11, 48, 23, 22, 12, 38]
df['score4'] = [43, 23, 41, 75, 93, 93, 23, 21]
df['condition1'] = [0, 0, 1, 0, 1, 0, 1, 0]
df['condition2'] = [1, 0, 1, 0, 0, 0, 0, 1]
df['condition3'] = [0, 0, 0, 1, 1, 0, 0, 1]

df = df.resample('D', how='mean')
df = df.rolling(30, min_periods=1).mean()

在这种情况下,我想做一个“条件平均”-即。只要这三个条件之一等于== 1,就只会计算具有1的“行”的均值。 例如,在满足条件1和条件3的时间3,我们只对[2009-01-05]做[38、12、11和41]的平均值,而忽略了[28、43、33、23]。 / p>

1 个答案:

答案 0 :(得分:0)

# convert your index to datetime:
df.index = pd.to_datetime(df.index)

# select the rows that meet condition:
df = df[df.loc[:,['condition1','condition2','condition3']].sum(axis=1)>0]

# resample
df = df.resample('D').mean()  # updated syntax

>>> print (df)
            score1  score2  score3  score4  condition1  condition2  condition3
2009-01-04    84.0    83.0    92.0    43.0         0.0         1.0         0.0
2009-01-05    38.0    12.0    11.0    41.0         1.0         1.0         0.0
2009-01-06    35.5    78.5    35.5    84.0         0.5         0.0         1.0
2009-01-07    29.5    20.5    25.0    22.0         0.5         0.5         0.5