如何将变换值列转换为pandas python的分位数?

时间:2016-04-26 07:47:27

标签: python pandas quantile

我使用pandas来分析我的数据,然后执行:

df = pd.DataFrame(datas, columns=['userid', 'recency', 'frequency', 'monetary'])


print df
      userid  recency  frequency  monetary
0     47918        9         53  788778
1     48302       85         10  232323
2      8873        3         79  2323
3     63158       23         23  2323232
4       364       14         43  232323
5     45191        1         75  224455
6     21061        9         64  23367
7     41356       22         55  2346777
8     42455       14         30  23478
9     65460        3         16  2345

我需要将值recency frequencymonetary转换为范围1-5中的值。所以输出是

      userid  recency  frequency  monetary
0     47918        1         2    3
1     48302        2         1    2
2      8873        3         4    5
3     63158        2         2    2
4       364        5         4    2
5     45191        1         5    4
6     21061        4         4    3
7     41356        3         5    4
8     42455        5         3    5
9     65460        3         1    2

怎么能在python中做到这一点?

THX

1 个答案:

答案 0 :(得分:2)

您需要qcutcodes的IIUC,最后需要添加1,因为最小值为1且最大5

df['recency1'] = pd.qcut(df['recency'].values, 5)
df['frequency1'] = pd.qcut(df['frequency'].values, 5)
df['monetary1'] = pd.qcut(df['monetary'].values, 5)
print df
   userid  recency  frequency  monetary    recency1    frequency1  \
0   47918        9         53    788778      (3, 9]  (37.8, 53.8]   
1   48302       85         10    232323  (22.2, 85]    [10, 21.6]   
2    8873        3         79      2323      [1, 3]    (66.2, 79]   
3   63158       23         23   2323232  (22.2, 85]  (21.6, 37.8]   
4     364       14         43    232323     (9, 14]  (37.8, 53.8]   
5   45191        1         75    224455      [1, 3]    (66.2, 79]   
6   21061        9         64     23367      (3, 9]  (53.8, 66.2]   
7   41356       22         55   2346777  (14, 22.2]  (53.8, 66.2]   
8   42455       14         30     23478     (9, 14]  (21.6, 37.8]   
9   65460        3         16      2345      [1, 3]    [10, 21.6]   

              monetary1  
0   (232323, 1095668.8]  
1    (144064.2, 232323]  
2       [2323, 19162.6]  
3  (1095668.8, 2346777]  
4    (144064.2, 232323]  
5    (144064.2, 232323]  
6   (19162.6, 144064.2]  
7  (1095668.8, 2346777]  
8   (19162.6, 144064.2]  
9       [2323, 19162.6]  
df['recency'] = pd.qcut(df['recency'].values, 5).codes + 1
df['frequency'] = pd.qcut(df['frequency'].values, 5).codes + 1
df['monetary'] = pd.qcut(df['monetary'].values, 5).codes + 1
print df
   userid  recency  frequency  monetary
0   47918        2          3         4
1   48302        5          1         3
2    8873        1          5         1
3   63158        5          2         5
4     364        3          3         3
5   45191        1          5         3
6   21061        2          4         2
7   41356        4          4         5
8   42455        3          2         2
9   65460        1          1         1