使用公式和日平均值对pandas dataframe列进行标准化

时间:2017-10-16 09:14:34

标签: python pandas dataframe

我有一个pandas数据框,例如:

df = pd.DataFrame({
            'time' : pd.date_range('2017-07-18 00:00:00', '2017-07-21 00:00:00', freq='3H'),
            'val1' : np.random.random(25)*300,
            'val2' : np.random.random(25)*30})

df.set_index('time', inplace=True)

值数据框:

real_values = pd.DataFrame({
    'day' : [18, 19, 20],
    'values' : [500, 600, 700]})

我想使用以下公式对列val1的值进行标准化:

new_value = old_value*real_value_that_day/daily_average

这是,每个值乘以当天实际值与日平均值之间的分数。

我尝试使用.map,但我无法在数据框中包含index.day条件。我尝试使用groupby(df.index.day),但我不知道热得出最终结果。

非常感谢

2 个答案:

答案 0 :(得分:2)

我认为你需要:

np.random.seed(45)
df = pd.DataFrame({
            'time' : pd.date_range('2017-07-18 00:00:00', '2017-07-21 00:00:00', freq='3H'),
            'val1' : np.random.random(25)*300,
            'val2' : np.random.random(25)*30})

df.set_index('time', inplace=True)

real_values = pd.DataFrame({
    'day' : [18, 19, 20],
    'values' : [500, 600, 700]})
#map real_values to Series with same length as df by days
a = pd.Series(df.index.day, index=df.index).map(real_values.set_index('day')['values'])
print (a.head())
time
2017-07-18 00:00:00    500.0
2017-07-18 03:00:00    500.0
2017-07-18 06:00:00    500.0
2017-07-18 09:00:00    500.0
2017-07-18 12:00:00    500.0
Name: time, dtype: float64
#original multiple by Series a and divide by daily average by transform
df1 = df.mul(a, 0).div(df.groupby(df.index.day).transform('mean'))
print (df1)
                            val1         val2
time                                         
2017-07-18 00:00:00  1307.171491   403.372865
2017-07-18 03:00:00   726.330473   851.356196
2017-07-18 06:00:00   371.987469    77.497641
2017-07-18 09:00:00   102.153227   959.768694
2017-07-18 12:00:00   587.453074   233.817177
2017-07-18 15:00:00   624.907891   734.391568
2017-07-18 18:00:00    64.131282   114.951326
2017-07-18 21:00:00   215.865093   624.844533
2017-07-19 00:00:00   120.686108   542.744066
2017-07-19 03:00:00   653.014193  1116.500860
2017-07-19 06:00:00   891.148297   333.591495
2017-07-19 09:00:00   676.652432   610.715673
2017-07-19 12:00:00  1031.182496   743.728715
2017-07-19 15:00:00   489.559748   336.152862
2017-07-19 18:00:00   643.545466   147.084368
2017-07-19 21:00:00   294.211260   969.481959
2017-07-20 00:00:00  1474.421809   404.910284
2017-07-20 03:00:00  1016.785621  1078.311435
2017-07-20 06:00:00   665.498098   589.809072
2017-07-20 09:00:00   437.622829   122.931391
2017-07-20 12:00:00   769.989526  1158.555013
2017-07-20 15:00:00   169.891633   968.620184
2017-07-20 18:00:00   342.854461   159.225353
2017-07-20 21:00:00   722.936022  1117.637269
2017-07-21 00:00:00          NaN          NaN

详情:

print (df.groupby(df.index.day).transform('mean'))
                           val1       val2
time                                      
2017-07-18 00:00:00  113.490638  14.427688
2017-07-18 03:00:00  113.490638  14.427688
2017-07-18 06:00:00  113.490638  14.427688
2017-07-18 09:00:00  113.490638  14.427688
2017-07-18 12:00:00  113.490638  14.427688
2017-07-18 15:00:00  113.490638  14.427688
2017-07-18 18:00:00  113.490638  14.427688
2017-07-18 21:00:00  113.490638  14.427688
2017-07-19 00:00:00  172.937287  13.491194
2017-07-19 03:00:00  172.937287  13.491194
2017-07-19 06:00:00  172.937287  13.491194
2017-07-19 09:00:00  172.937287  13.491194
2017-07-19 12:00:00  172.937287  13.491194
2017-07-19 15:00:00  172.937287  13.491194
2017-07-19 18:00:00  172.937287  13.491194
2017-07-19 21:00:00  172.937287  13.491194
2017-07-20 00:00:00  139.010896  16.081470
2017-07-20 03:00:00  139.010896  16.081470
2017-07-20 06:00:00  139.010896  16.081470
2017-07-20 09:00:00  139.010896  16.081470
2017-07-20 12:00:00  139.010896  16.081470
2017-07-20 15:00:00  139.010896  16.081470
2017-07-20 18:00:00  139.010896  16.081470
2017-07-20 21:00:00  139.010896  16.081470
2017-07-21 00:00:00   72.827447   2.008148

答案 1 :(得分:1)

这应该这样做:

est : 4
ing : 6
rst : 2

您只是错过了第21页的映射。