我正在研究0-1、2-3、4-6和> = 7等数字范围的分布 我有以下数据框
df = pd.DataFrame()
df['T1'] =[0,2,0,3,4,5,1]
df ['T2']= [1,2,3,0,2,3,3]
df['TT'] = df.T1+df.T2
我想创建一个新列来确定范围,我这样写:
U0_1 = df ['TT']<=1
U2_3 = df ['TT']>1 & df ['TT']<=3
U4_6 = df ['TT']>3 & df ['TT']<=6
df ['TG'] = np.select([U0_1,U2_3,U4_6],['TG_0-1','TG_2-3','TG_4-6'],default = 'TG_7>=')
但是它显示了以下错误:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
我可以向大家寻求建议吗?
谢谢
Zep。
答案 0 :(得分:2)
您可以使用pd.cut
,对于这种类型的问题,我认为这是更好的方法
pd.cut(df.TT,[0,1,3,6,np.inf],labels=['TG_0-1','TG_2-3','TG_4-6','TG_7>='])
0 TG_0-1
1 TG_4-6
2 TG_2-3
3 TG_2-3
4 TG_4-6
5 TG_7>=
6 TG_4-6
Name: TT, dtype: category
要修改代码,请添加()
U0_1 = df ['TT']<=1
U2_3 = (df ['TT']>1) & (df ['TT']<=3)
U4_6 = (df ['TT']>3) & (df ['TT']<=6)
np.select([U0_1,U2_3,U4_6],['TG_0-1','TG_2-3','TG_4-6'],default = 'TG_7>=')
array(['TG_0-1', 'TG_4-6', 'TG_2-3', 'TG_2-3', 'TG_4-6', 'TG_7>=',
'TG_4-6'], dtype='<U6')