我想在pandas数据框中创建一个新列,需要填充字符串,例如“小于1”,“介于1和2之间”,“介于2和3”之间,最多20个单位为1个单位。这些字符串将通过df.Data列的查找进行分配,对于每一行,将为新列分配一个字符串。
谢谢
答案 0 :(得分:0)
IIUC您可以使用pd.cut()
方法:
In [209]: df = pd.DataFrame({'Data':np.random.rand(15)*20})
In [210]: df
Out[210]:
Data
0 18.890987
1 7.177557
2 18.603053
3 3.423876
4 16.434591
5 8.696325
6 19.083220
7 10.402671
8 5.798423
9 13.271339
10 0.955819
11 8.997453
12 3.617207
13 2.110642
14 13.547091
In [211]: bins = np.arange(0, 21)
In [212]: labels = ['less than 1'] + ['between {} and {}'.format(i, i+1) for i in np.arange(1, 20)]
In [213]: df['s'] = pd.cut(df.Data, bins=bins, labels=labels, right=True)
In [214]: df
Out[214]:
Data s
0 18.890987 between 18 and 19
1 7.177557 between 7 and 8
2 18.603053 between 18 and 19
3 3.423876 between 3 and 4
4 16.434591 between 16 and 17
5 8.696325 between 8 and 9
6 19.083220 between 19 and 20
7 10.402671 between 10 and 11
8 5.798423 between 5 and 6
9 13.271339 between 13 and 14
10 0.955819 less than 1
11 8.997453 between 8 and 9
12 3.617207 between 3 and 4
13 2.110642 between 2 and 3
14 13.547091 between 13 and 14