你好,我有以下数据框:
import pandas as pd
df = pd.DataFrame()
df.index = ['2009-01-04', '2009-01-05', '2009-01-05', '2009-01-06', '2009-01-06', '2009-01-07', '2009-01-07', '2009-01-07']
df['score1'] = [84, 28, 38, 48, 23, 38, 22, 37]
df['score2'] = [83, 43, 12, 93, 64, 28, 29, 12]
df['score3'] = [92, 33, 11, 48, 23, 22, 12, 38]
df['score4'] = [43, 23, 41, 75, 93, 93, 23, 21]
df['condition1'] = [0, 0, 1, 0, 1, 0, 1, 0]
df['condition2'] = [1, 0, 1, 0, 0, 0, 0, 1]
df['condition3'] = [0, 0, 0, 1, 1, 0, 0, 1]
df = df.resample('D', how='mean')
df = df.rolling(30, min_periods=1).mean()
我想在30天内进行滚动均值,但是在满足“条件”之一(即condition == 1)的行上有一个超重。即。符合条件的行将极大地影响30天的时间范围。
有没有办法做到这一点?
答案 0 :(得分:1)
我不确定我是否理解,但是您是否可以根据条件使用加权分数进行展期?
extra_weight=2 # when condition is met, score is multiplied by extra_weight+1
df['weighted_score1']=df['score1']*(df['condition1']*extra_weight+1) # we add 1 so that score is counted even when condition == 0
#repeat for score2 and 3
df = df.rolling(30, min_periods=1).mean()
更新以回答评论:根据多种情况施加体重。
在“条件”列中,只有1和0。 为了满足两列之间的AND条件,您可以采用最小值。确实,如果两列均为1,则为1;如果一列或两列均为0,则为0。 同样,要满足“或”条件,可以采用最大值。
例如,如果您想为(condition1 AND condition2) OR condition3
添加额外的权重:
import numpy as np
df['final_cond']= np.maximum(np.minimum(df['condition1'],df['condition2']),df['condition3'])
df['weighted_score1']=df['score1']*(df['final_cond']*extra_weight+1)