在pandas dataframe

时间:2017-03-09 20:55:11

标签: python dataframe

我想在pandas数据框中创建一个新列,需要填充字符串,例如“小于1”,“介于1和2之间”,“介于2和3”之间,最多20个单位为1个单位。这些字符串将通过df.Data列的查找进行分配,对于每一行,将为新列分配一个字符串。

谢谢

1 个答案:

答案 0 :(得分:0)

IIUC您可以使用pd.cut()方法:

In [209]: df = pd.DataFrame({'Data':np.random.rand(15)*20})

In [210]: df
Out[210]:
         Data
0   18.890987
1    7.177557
2   18.603053
3    3.423876
4   16.434591
5    8.696325
6   19.083220
7   10.402671
8    5.798423
9   13.271339
10   0.955819
11   8.997453
12   3.617207
13   2.110642
14  13.547091

In [211]: bins = np.arange(0, 21)

In [212]: labels = ['less than 1'] + ['between {} and {}'.format(i, i+1) for i in np.arange(1, 20)]

In [213]: df['s'] = pd.cut(df.Data, bins=bins, labels=labels, right=True)

In [214]: df
Out[214]:
         Data                  s
0   18.890987  between 18 and 19
1    7.177557    between 7 and 8
2   18.603053  between 18 and 19
3    3.423876    between 3 and 4
4   16.434591  between 16 and 17
5    8.696325    between 8 and 9
6   19.083220  between 19 and 20
7   10.402671  between 10 and 11
8    5.798423    between 5 and 6
9   13.271339  between 13 and 14
10   0.955819        less than 1
11   8.997453    between 8 and 9
12   3.617207    between 3 and 4
13   2.110642    between 2 and 3
14  13.547091  between 13 and 14