我有pandas值数据框架,我想对数据进行标准化。具体来说,我想每月对数据进行标准化。 我相信我需要使用groupby和lambda函数,但是当我尝试这种方法时,在输出中会收到NaN。
import numpy as np
import pandas as pd
arr = pd.DataFrame(np.arange(1,21), columns=['Output'])
arr2 = pd.DataFrame(np.arange(10, 210, 10), columns=['Output2'])
index2 = pd.date_range('20180928 10:00am', periods=20, freq="W")
index3 = pd.DataFrame(index2, columns=['Date'])
df2 = pd.concat([index3, arr, arr2], axis=1)
print(df2)
cols = df2.columns[1:]
df2_grouped = df2.groupby(['Date'])
for c in cols:
df2[c] = df2_grouped[c].apply(lambda x: (x-x.mean()) / x.std())
print(df2)
Date Output Output2
0 2018-09-30 10:00:00 1 10
1 2018-10-07 10:00:00 2 20
2 2018-10-14 10:00:00 3 30
3 2018-10-21 10:00:00 4 40
4 2018-10-28 10:00:00 5 50
5 2018-11-04 10:00:00 6 60
6 2018-11-11 10:00:00 7 70
7 2018-11-18 10:00:00 8 80
8 2018-11-25 10:00:00 9 90
9 2018-12-02 10:00:00 10 100
10 2018-12-09 10:00:00 11 110
11 2018-12-16 10:00:00 12 120
12 2018-12-23 10:00:00 13 130
13 2018-12-30 10:00:00 14 140
14 2019-01-06 10:00:00 15 150
15 2019-01-13 10:00:00 16 160
16 2019-01-20 10:00:00 17 170
17 2019-01-27 10:00:00 18 180
18 2019-02-03 10:00:00 19 190
19 2019-02-10 10:00:00 20 200
Date Output Output2
0 2018-09-30 10:00:00 NaN NaN
1 2018-10-07 10:00:00 NaN NaN
2 2018-10-14 10:00:00 NaN NaN
3 2018-10-21 10:00:00 NaN NaN
4 2018-10-28 10:00:00 NaN NaN
5 2018-11-04 10:00:00 NaN NaN
6 2018-11-11 10:00:00 NaN NaN
7 2018-11-18 10:00:00 NaN NaN
8 2018-11-25 10:00:00 NaN NaN
9 2018-12-02 10:00:00 NaN NaN
10 2018-12-09 10:00:00 NaN NaN
11 2018-12-16 10:00:00 NaN NaN
12 2018-12-23 10:00:00 NaN NaN
13 2018-12-30 10:00:00 NaN NaN
14 2019-01-06 10:00:00 NaN NaN
15 2019-01-13 10:00:00 NaN NaN
16 2019-01-20 10:00:00 NaN NaN
17 2019-01-27 10:00:00 NaN NaN
18 2019-02-03 10:00:00 NaN NaN
19 2019-02-10 10:00:00 NaN NaN
答案 0 :(得分:1)
尝试尝试pd.Grouper()。
df2.set_index('Date', inplace=True)
df2_grouped = df2.groupby(pd.Grouper(freq='M'))
检查所有可用的频率字符串:link