我将sklearn RandomForestClassifier用于预测任务。
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier(n_estimators=300, n_jobs=-1)
model.fit(x_train,y_train)
model.predict_proba(x_test)
有171个类别可以预测。
我只想预测predict_proba(class)
至少为90%的那些类。下面的所有内容均应设置为0
。
例如,给出以下内容:
1 2 3 4 5 6 7
0 0.0 0.0 0.1 0.9 0.0 0.0 0.0
1 0.2 0.1 0.1 0.3 0.1 0.0 0.2
2 0.1 0.1 0.1 0.1 0.1 0.4 0.1
3 1.0 0.0 0.0 0.0 0.0 0.0 0.0
我的预期输出是:
0 4
1 0
2 0
3 1
答案 0 :(得分:1)
from sklearn.ensemble import RandomForestClassifier
import numpy as np
model = RandomForestClassifier(n_estimators=300, n_jobs=-1)
model.fit(x_train,y_train)
preds = model.predict_proba(x_test)
#preds = np.array([[0.0, 0.0, 0.1, 0.9, 0.0, 0.0, 0.0],
# [ 0.2, 0.1, 0.1, 0.3, 0.1, 0.0, 0.2],
# [ 0.1 ,0.1, 0.1, 0.1, 0.1, 0.4, 0.1],
# [ 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]])
r = np.zeros(preds.shape[0], dtype=int)
t = np.argwhere(preds>=0.9)
r[t[:,0]] = t[:,1]+1
r
array([4, 0, 0, 1])
答案 1 :(得分:1)
您可以使用列表推导:
import numpy as np
# dummy predictions - 3 samples, 3 classes
pred = np.array([[0.1, 0.2, 0.7],
[0.95, 0.02, 0.03],
[0.08, 0.02, 0.9]])
# first, keep only entries >= 0.9:
out_temp = np.array([[x[i] if x[i] >= 0.9 else 0 for i in range(len(x))] for x in pred])
out_temp
# result:
array([[0. , 0. , 0. ],
[0.95, 0. , 0. ],
[0. , 0. , 0.9 ]])
out = [0 if not x.any() else x.argmax()+1 for x in out_temp]
out
# result:
[0, 1, 3]