我有一个DataFrame,其中包含一组在不同时间测量的值。我想将每天所取的值标准化为一。怎么办呢?
具体来说,我有以下形式的数据:
value
datetime
2017-03-08 14:36:06.616166 1002.49
2017-03-08 15:06:07.661818 992.68
2017-03-08 15:36:08.597443 984.34
2017-03-08 16:06:09.265451 989.32
2017-03-08 16:36:10.581452 1004.00
2017-03-08 17:06:11.269434 1003.97
2017-03-08 17:36:12.117443 994.80
2017-03-08 18:06:12.809445 994.17
2017-03-08 18:36:14.029444 997.93
2017-03-08 19:06:14.654631 989.65
2017-03-08 19:36:15.413438 991.14
2017-03-08 20:06:16.145432 984.65
2017-03-08 20:36:17.265443 993.30
2017-03-08 21:06:18.117434 981.18
2017-03-08 21:36:19.165447 987.64
2017-03-08 22:06:19.909443 985.26
2017-03-08 22:36:20.569442 980.40
2017-03-08 23:06:21.197446 988.59
2017-03-08 23:36:21.989448 984.59
2017-03-09 00:06:22.665448 983.91
2017-03-09 00:36:23.281681 993.65
2017-03-09 01:06:23.857440 986.69
2017-03-09 01:36:24.441713 984.04
2017-03-09 02:06:25.117453 989.92
2017-03-09 02:36:25.953449 978.82
2017-03-09 03:06:26.521704 987.42
2017-03-09 03:36:27.157448 996.66
2017-03-09 04:06:27.725445 996.66
2017-03-09 04:36:29.201442 996.66
2017-03-09 05:06:29.765443 989.82
... ...
2017-03-22 20:16:24.007637 833.74
2017-03-22 20:46:24.583127 834.69
2017-03-22 21:16:25.217536 829.66
我想分别将2017-03-08,2017-03-09等所有值标准化,并将这些标准化值添加为新列。
值列表的简单规范化函数如下:
def normalize(x, summation = None):
if summation is None:
summation = sum(x) # normalize to unity
return [element/summation for element in x]
因此,对于2017-03-08,标准化值如下:
value value_day_normalized
datetime
2017-03-08 14:36:06.616166 1002.49 0.0532386976171
2017-03-08 15:06:07.661818 992.68 0.0527177232197
2017-03-08 15:36:08.597443 984.34 0.0522748153223
2017-03-08 16:06:09.265451 989.32 0.0525392855057
2017-03-08 16:36:10.581452 1004.00 0.0533188883755
2017-03-08 17:06:11.269434 1003.97 0.0533172951817
2017-03-08 17:36:12.117443 994.80 0.0528303089203
2017-03-08 18:06:12.809445 994.17 0.0527968518489
2017-03-08 18:36:14.029444 997.93 0.052996532148
2017-03-08 19:06:14.654631 989.65 0.0525568106383
2017-03-08 19:36:15.413438 991.14 0.0526359392674
2017-03-08 20:06:16.145432 984.65 0.0522912783257
2017-03-08 20:36:17.265443 993.30 0.0527506492265
2017-03-08 21:06:18.117434 981.18 0.0521069989007
2017-03-08 21:36:19.165447 987.64 0.0524500666486
2017-03-08 22:06:19.909443 985.26 0.0523236732678
2017-03-08 22:36:20.569442 980.40 0.0520655758599
2017-03-08 23:06:21.197446 988.59 0.052500517788
2017-03-08 23:36:21.989448 984.59 0.0522880919379
这样的事情怎么办?我觉得它可能涉及使用DataFrame方法groupby
,但我不知道如何处理这个问题。
答案 0 :(得分:2)
div
groupby
GroupBy.transform
resample
Resampler.transform
Series
{} {{}}
df
this article和的另一种解决方案:
df['value_day_normalized'] = df['value'].div(df.groupby(pd.Grouper(freq='D'))['value']
.transform('sum'))
print (df)
value value_day_normalized
datetime
2017-03-08 14:36:06.616166 1002.49 0.053239
2017-03-08 15:06:07.661818 992.68 0.052718
2017-03-08 15:36:08.597443 984.34 0.052275
2017-03-08 16:06:09.265451 989.32 0.052539
2017-03-08 16:36:10.581452 1004.00 0.053319
2017-03-08 17:06:11.269434 1003.97 0.053317
2017-03-08 17:36:12.117443 994.80 0.052830
2017-03-08 18:06:12.809445 994.17 0.052797
2017-03-08 18:36:14.029444 997.93 0.052997
...
...