使用Python标准化每月数据

时间:2019-02-12 06:18:47

标签: python pandas-groupby

我有pandas值数据框架,我想对数据进行标准化。具体来说,我想每月对数据进行标准化。 我相信我需要使用groupby和lambda函数,但是当我尝试这种方法时,在输出中会收到NaN。

import numpy as np
import pandas as pd

arr = pd.DataFrame(np.arange(1,21), columns=['Output'])
arr2 = pd.DataFrame(np.arange(10, 210, 10), columns=['Output2'])
index2 = pd.date_range('20180928 10:00am', periods=20, freq="W")
index3 = pd.DataFrame(index2, columns=['Date'])
df2 = pd.concat([index3, arr, arr2], axis=1)

print(df2)

cols = df2.columns[1:]
df2_grouped = df2.groupby(['Date'])

for c in cols:
    df2[c] = df2_grouped[c].apply(lambda x: (x-x.mean()) / x.std())
print(df2)

Date  Output  Output2
0  2018-09-30 10:00:00       1       10
1  2018-10-07 10:00:00       2       20
2  2018-10-14 10:00:00       3       30
3  2018-10-21 10:00:00       4       40
4  2018-10-28 10:00:00       5       50
5  2018-11-04 10:00:00       6       60
6  2018-11-11 10:00:00       7       70
7  2018-11-18 10:00:00       8       80
8  2018-11-25 10:00:00       9       90
9  2018-12-02 10:00:00      10      100
10 2018-12-09 10:00:00      11      110
11 2018-12-16 10:00:00      12      120
12 2018-12-23 10:00:00      13      130
13 2018-12-30 10:00:00      14      140
14 2019-01-06 10:00:00      15      150
15 2019-01-13 10:00:00      16      160
16 2019-01-20 10:00:00      17      170
17 2019-01-27 10:00:00      18      180
18 2019-02-03 10:00:00      19      190
19 2019-02-10 10:00:00      20      200
                  Date  Output  Output2
0  2018-09-30 10:00:00     NaN      NaN
1  2018-10-07 10:00:00     NaN      NaN
2  2018-10-14 10:00:00     NaN      NaN
3  2018-10-21 10:00:00     NaN      NaN
4  2018-10-28 10:00:00     NaN      NaN
5  2018-11-04 10:00:00     NaN      NaN
6  2018-11-11 10:00:00     NaN      NaN
7  2018-11-18 10:00:00     NaN      NaN
8  2018-11-25 10:00:00     NaN      NaN
9  2018-12-02 10:00:00     NaN      NaN
10 2018-12-09 10:00:00     NaN      NaN
11 2018-12-16 10:00:00     NaN      NaN
12 2018-12-23 10:00:00     NaN      NaN
13 2018-12-30 10:00:00     NaN      NaN
14 2019-01-06 10:00:00     NaN      NaN
15 2019-01-13 10:00:00     NaN      NaN
16 2019-01-20 10:00:00     NaN      NaN
17 2019-01-27 10:00:00     NaN      NaN
18 2019-02-03 10:00:00     NaN      NaN
19 2019-02-10 10:00:00     NaN      NaN

1 个答案:

答案 0 :(得分:1)

尝试尝试pd.Grouper()。

df2.set_index('Date', inplace=True)
df2_grouped = df2.groupby(pd.Grouper(freq='M'))

检查所有可用的频率字符串:link