我已训练并抛弃了此ML
模型,因此我可以在任何地方使用它。而且我不仅需要获取score
,predict
值,还需要获取predict_proba
值。
我可以理解,但是问题是,我期望概率在0
和1
之间,但是我得到下面的其他信息。
array([[1.00000000e+00, 2.46920929e-12],
[1.00000000e+00, 9.89834607e-11],
[9.99993281e-01, 6.71853451e-06],
...,
[1.22327143e-01, 8.77672857e-01],
[9.99999653e-01, 3.47049875e-07],
[1.00000000e+00, 3.79462343e-10]])
这是我正在使用的python
代码。
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import pickle
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
# dataframe = pd.read_csv("hr_dataset.csv")
dataframe = pd.read_csv("formodel.csv")
dataframe.head(2)
# spare input and target variables
inputs = dataframe.drop('PerformanceRating', axis='columns')
target = dataframe['PerformanceRating']
MaritalStatus_ = LabelEncoder()
JobRole_ = LabelEncoder()
Gender_ = LabelEncoder()
EducationField_ = LabelEncoder()
Department_ = LabelEncoder()
BusinessTravel_ = LabelEncoder()
Attrition_ = LabelEncoder()
OverTime_ = LabelEncoder()
Over18_ = LabelEncoder()
inputs['MaritalStatus_'] = MaritalStatus_.fit_transform(inputs['MaritalStatus'])
inputs['JobRole_'] = JobRole_.fit_transform(inputs['JobRole'])
inputs['Gender_'] = Gender_.fit_transform(inputs['Gender'])
inputs['EducationField_'] = EducationField_.fit_transform(inputs['EducationField'])
inputs['Department_'] = Department_.fit_transform(inputs['Department'])
inputs['BusinessTravel_'] = BusinessTravel_.fit_transform(inputs['BusinessTravel'])
inputs['Attrition_'] = Attrition_.fit_transform(inputs['Attrition'])
inputs['OverTime_'] = OverTime_.fit_transform(inputs['OverTime'])
inputs['Over18_'] = Over18_.fit_transform(inputs['Over18'])
inputs.drop(['MaritalStatus', 'JobRole', 'Attrition' , 'OverTime' , 'EmployeeCount', 'EmployeeNumber',
'Gender', 'EducationField', 'Department', 'BusinessTravel', 'Over18'], axis='columns', inplace=True)
inputsNew = inputs
inputs.head(2)
# inputs = scaled_df
X_train, X_testt, y_train, y_testt = train_test_split(inputs, target, test_size=0.2)
loaded_model = pickle.load(open(filename, 'rb'))
result = loaded_model.score(X_testt, y_testt)
print(result)
loaded_model.predict_proba(inputs) // this produces above result, will put it below as well
outpu由loaded_model.predict_proba(inputs)
array([[1.00000000e+00, 2.46920929e-12],
[1.00000000e+00, 9.89834607e-11],
[9.99993281e-01, 6.71853451e-06],
...,
[1.22327143e-01, 8.77672857e-01],
[9.99999653e-01, 3.47049875e-07],
[1.00000000e+00, 3.79462343e-10]])
如何转换这些值或获得类似百分比的输出? (eg: 12%, 50%, 96%
)
答案 0 :(得分:0)
要将概率数组从十进制转换为百分比,可以编写(loaded_model.predict_proba(inputs)) * 100
。
编辑:loaded_model.predict_proba(inputs)
输出的格式只是科学计数法,即所有这些数字都在0和1之间概率很小,因此用科学计数法表示。
看到如此小的概率的原因是loaded_model.predict_proba(inputs)[:,0]
(概率数组的第一列)代表属于一个类别的数据的概率,而loaded_model.predict_proba(inputs)[:,1]
代表数据的概率属于另一类。
换句话说,这意味着概率数组的每一行应加起来为1。
我希望这会有所帮助!
答案 1 :(得分:0)
Internal Server Error
The server encountered an internal error or misconfiguration and was unable to complete your request.
Please contact the server administrator at postmaster@localhost to inform them of the time this error occurred, and the actions you performed just before this error.
More information about this error may be available in the server error log.
Additionally, a 500 Internal Server Error error was encountered while trying to use an ErrorDocument to handle the request.
Apache/2.4.43 (Win64) OpenSSL/1.1.1g PHP/7.4.6 Server at localhost Port 80
输出第1类和第2类的概率(因为您有2类)。这就是为什么每次出现数据都会看到2个输出的原因。每次出现的总概率总计为1。
比方说,如果您只关心第二类的概率,则可以使用下面的代码获取第二类的概率。
loaded_model.predict_proba(inputs)
我不确定这是否是您要的内容,如果我误解了您的问题,我们深表歉意。