我有一个pandas数据框,例如:
df = pd.DataFrame({
'time' : pd.date_range('2017-07-18 00:00:00', '2017-07-21 00:00:00', freq='3H'),
'val1' : np.random.random(25)*300,
'val2' : np.random.random(25)*30})
df.set_index('time', inplace=True)
值数据框:
real_values = pd.DataFrame({
'day' : [18, 19, 20],
'values' : [500, 600, 700]})
我想使用以下公式对列val1
的值进行标准化:
new_value = old_value*real_value_that_day/daily_average
这是,每个值乘以当天实际值与日平均值之间的分数。
我尝试使用.map
,但我无法在数据框中包含index.day
条件。我尝试使用groupby(df.index.day),但我不知道热得出最终结果。
非常感谢
答案 0 :(得分:2)
我认为你需要:
np.random.seed(45)
df = pd.DataFrame({
'time' : pd.date_range('2017-07-18 00:00:00', '2017-07-21 00:00:00', freq='3H'),
'val1' : np.random.random(25)*300,
'val2' : np.random.random(25)*30})
df.set_index('time', inplace=True)
real_values = pd.DataFrame({
'day' : [18, 19, 20],
'values' : [500, 600, 700]})
#map real_values to Series with same length as df by days
a = pd.Series(df.index.day, index=df.index).map(real_values.set_index('day')['values'])
print (a.head())
time
2017-07-18 00:00:00 500.0
2017-07-18 03:00:00 500.0
2017-07-18 06:00:00 500.0
2017-07-18 09:00:00 500.0
2017-07-18 12:00:00 500.0
Name: time, dtype: float64
#original multiple by Series a and divide by daily average by transform
df1 = df.mul(a, 0).div(df.groupby(df.index.day).transform('mean'))
print (df1)
val1 val2
time
2017-07-18 00:00:00 1307.171491 403.372865
2017-07-18 03:00:00 726.330473 851.356196
2017-07-18 06:00:00 371.987469 77.497641
2017-07-18 09:00:00 102.153227 959.768694
2017-07-18 12:00:00 587.453074 233.817177
2017-07-18 15:00:00 624.907891 734.391568
2017-07-18 18:00:00 64.131282 114.951326
2017-07-18 21:00:00 215.865093 624.844533
2017-07-19 00:00:00 120.686108 542.744066
2017-07-19 03:00:00 653.014193 1116.500860
2017-07-19 06:00:00 891.148297 333.591495
2017-07-19 09:00:00 676.652432 610.715673
2017-07-19 12:00:00 1031.182496 743.728715
2017-07-19 15:00:00 489.559748 336.152862
2017-07-19 18:00:00 643.545466 147.084368
2017-07-19 21:00:00 294.211260 969.481959
2017-07-20 00:00:00 1474.421809 404.910284
2017-07-20 03:00:00 1016.785621 1078.311435
2017-07-20 06:00:00 665.498098 589.809072
2017-07-20 09:00:00 437.622829 122.931391
2017-07-20 12:00:00 769.989526 1158.555013
2017-07-20 15:00:00 169.891633 968.620184
2017-07-20 18:00:00 342.854461 159.225353
2017-07-20 21:00:00 722.936022 1117.637269
2017-07-21 00:00:00 NaN NaN
详情:
print (df.groupby(df.index.day).transform('mean'))
val1 val2
time
2017-07-18 00:00:00 113.490638 14.427688
2017-07-18 03:00:00 113.490638 14.427688
2017-07-18 06:00:00 113.490638 14.427688
2017-07-18 09:00:00 113.490638 14.427688
2017-07-18 12:00:00 113.490638 14.427688
2017-07-18 15:00:00 113.490638 14.427688
2017-07-18 18:00:00 113.490638 14.427688
2017-07-18 21:00:00 113.490638 14.427688
2017-07-19 00:00:00 172.937287 13.491194
2017-07-19 03:00:00 172.937287 13.491194
2017-07-19 06:00:00 172.937287 13.491194
2017-07-19 09:00:00 172.937287 13.491194
2017-07-19 12:00:00 172.937287 13.491194
2017-07-19 15:00:00 172.937287 13.491194
2017-07-19 18:00:00 172.937287 13.491194
2017-07-19 21:00:00 172.937287 13.491194
2017-07-20 00:00:00 139.010896 16.081470
2017-07-20 03:00:00 139.010896 16.081470
2017-07-20 06:00:00 139.010896 16.081470
2017-07-20 09:00:00 139.010896 16.081470
2017-07-20 12:00:00 139.010896 16.081470
2017-07-20 15:00:00 139.010896 16.081470
2017-07-20 18:00:00 139.010896 16.081470
2017-07-20 21:00:00 139.010896 16.081470
2017-07-21 00:00:00 72.827447 2.008148
答案 1 :(得分:1)
这应该这样做:
est : 4
ing : 6
rst : 2
您只是错过了第21页的映射。