我使用pandas来分析我的数据,然后执行:
df = pd.DataFrame(datas, columns=['userid', 'recency', 'frequency', 'monetary'])
print df
userid recency frequency monetary
0 47918 9 53 788778
1 48302 85 10 232323
2 8873 3 79 2323
3 63158 23 23 2323232
4 364 14 43 232323
5 45191 1 75 224455
6 21061 9 64 23367
7 41356 22 55 2346777
8 42455 14 30 23478
9 65460 3 16 2345
我需要将值recency
frequency
和monetary
转换为范围1-5中的值。所以输出是
userid recency frequency monetary
0 47918 1 2 3
1 48302 2 1 2
2 8873 3 4 5
3 63158 2 2 2
4 364 5 4 2
5 45191 1 5 4
6 21061 4 4 3
7 41356 3 5 4
8 42455 5 3 5
9 65460 3 1 2
怎么能在python中做到这一点?
THX
答案 0 :(得分:2)
您需要qcut
和codes
的IIUC,最后需要添加1
,因为最小值为1
且最大5
:
df['recency1'] = pd.qcut(df['recency'].values, 5)
df['frequency1'] = pd.qcut(df['frequency'].values, 5)
df['monetary1'] = pd.qcut(df['monetary'].values, 5)
print df
userid recency frequency monetary recency1 frequency1 \
0 47918 9 53 788778 (3, 9] (37.8, 53.8]
1 48302 85 10 232323 (22.2, 85] [10, 21.6]
2 8873 3 79 2323 [1, 3] (66.2, 79]
3 63158 23 23 2323232 (22.2, 85] (21.6, 37.8]
4 364 14 43 232323 (9, 14] (37.8, 53.8]
5 45191 1 75 224455 [1, 3] (66.2, 79]
6 21061 9 64 23367 (3, 9] (53.8, 66.2]
7 41356 22 55 2346777 (14, 22.2] (53.8, 66.2]
8 42455 14 30 23478 (9, 14] (21.6, 37.8]
9 65460 3 16 2345 [1, 3] [10, 21.6]
monetary1
0 (232323, 1095668.8]
1 (144064.2, 232323]
2 [2323, 19162.6]
3 (1095668.8, 2346777]
4 (144064.2, 232323]
5 (144064.2, 232323]
6 (19162.6, 144064.2]
7 (1095668.8, 2346777]
8 (19162.6, 144064.2]
9 [2323, 19162.6]
df['recency'] = pd.qcut(df['recency'].values, 5).codes + 1
df['frequency'] = pd.qcut(df['frequency'].values, 5).codes + 1
df['monetary'] = pd.qcut(df['monetary'].values, 5).codes + 1
print df
userid recency frequency monetary
0 47918 2 3 4
1 48302 5 1 3
2 8873 1 5 1
3 63158 5 2 5
4 364 3 3 3
5 45191 1 5 3
6 21061 2 4 2
7 41356 4 4 5
8 42455 3 2 2
9 65460 1 1 1