我有一个看起来如下的数据框:
variable value
0 TrafficIntensity_end 217.0
1 TrafficIntensity_end+105 213.0
2 TrafficIntensity_end+120 204.0
3 TrafficIntensity_end+15 489.0
4 TrafficIntensity_end+30 479.0
5 TrafficIntensity_end+45 453.0
6 TrafficIntensity_end+60 387.0
7 TrafficIntensity_end+75 303.0
8 TrafficIntensity_end+90 221.0
9 pred_rf_end+15 545.0
10 pred_rf_end 244.0
11 pred_rf_end+30 448.0
12 pred_rf_end+45 408.0
13 pred_rf_end+60 363.0
14 pred_rf_end+75 305.0
15 pred_rf_end+90 199.0
16 pred_rf_end+105 181.0
17 pred_rf_end+120 163.0
我想根据['variable']
列中的字符串创建一个新列。我有以下代码:
def classify(row):
if row['variable'].str.contains('TrafficIntensity'):
return 'Real Traffic Intensity'
elif row['variable'].str.contains('pred_rf_end'):
return 'Predicited Value'
a['category'] = a.apply(classify, axis=1)
但是这给了我以下错误:
AttributeError: ("'str' object has no attribute 'str'", 'occurred at index 0')
为什么会发生这种情况,为什么我可以解决?谢谢!
答案 0 :(得分:2)
使用numpy.select
:
m1 = df['variable'].str.contains('TrafficIntensity')
m2 = df['variable'].str.contains('pred_rf_end')
a['category'] = np.select([m1, m2],
['Real Traffic Intensity','Predicited Value'],
a['variable'])
您的带有in
语句的测试标量的解决方案:
def classify(x):
if 'TrafficIntensity' in x:
return 'Real Traffic Intensity'
elif 'pred_rf_end' in x:
return 'Predicited Value'
else:
return x
a['category'] = a['variable'].apply(classify)