我是统计建模的新手,所以如果我对此感到误解,请原谅。
我目前正在使用python中的一个函数,该函数将在测试数据集上预测物流回归模型的准确性得分。用户将可以灵活地提供模型参数/系数(除了训练模型所生成的参数/要求之外)。我有一个可以更新系数的功能代码,但是无论我提供的模型参数有多不同,测试数据集的准确性或预测都保持不变。我的理解是,如果我更改模型系数,测试集上的分数应该更改吗?
我正在使用statsmodel库使事情变得更容易,并遵循了link。有人可以帮我了解我在想什么吗?下面是代码
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import statsmodels.formula.api as sm
from sklearn.model_selection import train_test_split
data = pd.read_csv("E:\\Dev\\testing\\rawdata.txt", header=None,
names=['Exam1', 'Exam2', 'Admitted'])
X = data.copy() # ou training data
y = X.Admitted.copy() # copy “y” column values out
X.drop(['Admitted'], axis=1, inplace=True) # then, drop y column
# manually add the intercept
X['intercept'] = 1.0 # so we don't need to use sm.add_constant every time
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42)
model = sm.Logit(y_train, X_train)
result = model.fit()
print("old parameters :\n" + str(list(result.params)))
#New parameters supplied
mdict = { 'Exam1':10000000.2234, 'Exam2':1.1233423, 'intercept':2313.423 }
result.params = mdict
print("New parameters: \n"+str(result.params))
def logitPredict(modelParams, X, threshold):
probabilities = modelParams.predict(X)
return [1 if x >= threshold else 0 for x in probabilities]
predictions = logitPredict(result, X_test, .5)
accuracy = np.mean(predictions == y_test)
#accuracy always remains same as train model
print ('accuracy = {0}%'.format(accuracy*100) )
#test sample
myExams = pd.DataFrame({'Exam1': [40.], 'Exam2': [78.], 'intercept': [1.]})
myExams
print ('Your probability = {0}%'.format(result.predict(myExams)[0]*100))