在Pandas DataFrame中,如何将每天的值标准化?

时间:2017-03-23 14:12:31

标签: pandas datetime dataframe

我有一个DataFrame,其中包含一组在不同时间测量的值。我想将每天所取的值标准化为一。怎么办呢?

具体来说,我有以下形式的数据:

                              value
datetime                           
2017-03-08 14:36:06.616166  1002.49
2017-03-08 15:06:07.661818   992.68
2017-03-08 15:36:08.597443   984.34
2017-03-08 16:06:09.265451   989.32
2017-03-08 16:36:10.581452  1004.00
2017-03-08 17:06:11.269434  1003.97
2017-03-08 17:36:12.117443   994.80
2017-03-08 18:06:12.809445   994.17
2017-03-08 18:36:14.029444   997.93
2017-03-08 19:06:14.654631   989.65
2017-03-08 19:36:15.413438   991.14
2017-03-08 20:06:16.145432   984.65
2017-03-08 20:36:17.265443   993.30
2017-03-08 21:06:18.117434   981.18
2017-03-08 21:36:19.165447   987.64
2017-03-08 22:06:19.909443   985.26
2017-03-08 22:36:20.569442   980.40
2017-03-08 23:06:21.197446   988.59
2017-03-08 23:36:21.989448   984.59
2017-03-09 00:06:22.665448   983.91
2017-03-09 00:36:23.281681   993.65
2017-03-09 01:06:23.857440   986.69
2017-03-09 01:36:24.441713   984.04
2017-03-09 02:06:25.117453   989.92
2017-03-09 02:36:25.953449   978.82
2017-03-09 03:06:26.521704   987.42
2017-03-09 03:36:27.157448   996.66
2017-03-09 04:06:27.725445   996.66
2017-03-09 04:36:29.201442   996.66
2017-03-09 05:06:29.765443   989.82
...                             ...
2017-03-22 20:16:24.007637   833.74
2017-03-22 20:46:24.583127   834.69
2017-03-22 21:16:25.217536   829.66

我想分别将2017-03-08,2017-03-09等所有值标准化,并将这些标准化值添加为新列。

值列表的简单规范化函数如下:

def normalize(x, summation = None):
    if summation is None:
        summation = sum(x) # normalize to unity
    return [element/summation for element in x]

因此,对于2017-03-08,标准化值如下:

                              value  value_day_normalized
datetime                                                 
2017-03-08 14:36:06.616166  1002.49  0.0532386976171
2017-03-08 15:06:07.661818   992.68  0.0527177232197
2017-03-08 15:36:08.597443   984.34  0.0522748153223
2017-03-08 16:06:09.265451   989.32  0.0525392855057
2017-03-08 16:36:10.581452  1004.00  0.0533188883755
2017-03-08 17:06:11.269434  1003.97  0.0533172951817
2017-03-08 17:36:12.117443   994.80  0.0528303089203
2017-03-08 18:06:12.809445   994.17  0.0527968518489
2017-03-08 18:36:14.029444   997.93  0.052996532148
2017-03-08 19:06:14.654631   989.65  0.0525568106383
2017-03-08 19:36:15.413438   991.14  0.0526359392674
2017-03-08 20:06:16.145432   984.65  0.0522912783257
2017-03-08 20:36:17.265443   993.30  0.0527506492265
2017-03-08 21:06:18.117434   981.18  0.0521069989007
2017-03-08 21:36:19.165447   987.64  0.0524500666486
2017-03-08 22:06:19.909443   985.26  0.0523236732678
2017-03-08 22:36:20.569442   980.40  0.0520655758599
2017-03-08 23:06:21.197446   988.59  0.052500517788
2017-03-08 23:36:21.989448   984.59  0.0522880919379

这样的事情怎么办?我觉得它可能涉及使用DataFrame方法groupby,但我不知道如何处理这个问题。

1 个答案:

答案 0 :(得分:2)

div groupby GroupBy.transform resample Resampler.transform Series {} {{}}

df

this articleenter image description here的另一种解决方案:

df['value_day_normalized'] = df['value'].div(df.groupby(pd.Grouper(freq='D'))['value']
                                               .transform('sum'))
print (df)
                             value  value_day_normalized
datetime                                                 
2017-03-08 14:36:06.616166  1002.49              0.053239
2017-03-08 15:06:07.661818   992.68              0.052718
2017-03-08 15:36:08.597443   984.34              0.052275
2017-03-08 16:06:09.265451   989.32              0.052539
2017-03-08 16:36:10.581452  1004.00              0.053319
2017-03-08 17:06:11.269434  1003.97              0.053317
2017-03-08 17:36:12.117443   994.80              0.052830
2017-03-08 18:06:12.809445   994.17              0.052797
2017-03-08 18:36:14.029444   997.93              0.052997
...
...