我一直在研究LightGBM预测模型,用于检查事物的可能性。 我使用最小最大缩放器缩放数据,保存并在缩放后的数据上训练了模型。
然后实时从以前加载我的模型和缩放器,并尝试预测新条目的可能性。 由于某种原因,我得到了一个负概率
这是代码:
# Model Vars
learning_rate = 0.005
boosting_type = 'gbdt'
objective = 'rmse'
metric = ['balanced_accuracy_score', 'rmse', 'auc']
sub_feature = 0.5721
num_leaves = 3000
min_data = 500
max_depth = 22
max_bin = 12000
def createLGBM():
# Read dataset
dataset = pd.read_csv('C:\FullDatasetNoArson.csv')
x = dataset.values # returns a numpy array
# Normalize Data
min_max_scaler = preprocessing.MinMaxScaler()
x_scaled = min_max_scaler.fit_transform(x)
# Save scaler
joblib.dump(min_max_scaler, "Saved_Scaler")
dataset = pd.DataFrame(x_scaled)
X = dataset.iloc[:, 0:8].values
y = dataset.iloc[:, 8].values
# Initiate LGBM
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=0)
d_train = lgb.Dataset(x_train, label=y_train)
params = {'task': 'train', 'learning_rate': learning_rate, 'boosting_type': boosting_type, 'objective': objective,
'metric': metric, 'sub_feature': sub_feature, 'num_leaves': num_leaves, 'min_data': min_data,
'max_depth': max_depth, 'max_bin': max_bin, 'num_threads': 7, 'is_training_metric': True, 'verbose': 1}
clf = lgb.train(params, d_train, 2000, keep_training_booster=True)
# Prediction
y_pred = clf.predict(x_test)
# convert into binary values
for i in range(0, len(y_pred)):
if y_pred[i] >= .5: # setting threshold to .5
y_pred[i] = 1
else:
y_pred[i] = 0
# Accuracy
accuracy = accuracy_score(y_pred, y_test)
clf.save_model("lgb-model_" + str(accuracy) + ".txt")
return "lgb-model_" + str(accuracy) + ".txt"
def predict(data):
data = np.asarray(data)
print(data)
min_max_scaler = joblib.load("Saved_Scaler")
data = min_max_scaler.transform(data)
data = data[:, 0:8]
model = lgb.Booster(model_file='lgb-model_0.7906763418553157.txt')
print(data)
pred = model.predict(data)
return pred.tolist()
答案 0 :(得分:1)
因此,在阅读了有关该主题的更多信息后,我意识到使用决策树回归模型可以返回负值,这很好。
但是,如果有人发现我的错误,那就是我尝试使用回归来计算某件事发生的概率,猜测此类事情发生的概率的正确方法是使用二进制对数损失分类(或Log-损失回归)。