我正在撰写一篇关于预测主要腹部切除并发症预测模型的科学论文。 我一直在使用scikit创建该模型并获得了良好的结果(得分为0.94)。这使我们想要看看scikit制作的模型是什么样的。
至于现在我们得到了100个输入变量,但从逻辑上讲,这些变量并不像其他变量那样有用,我们希望将这个数字减少到大约20,看看对分数的影响是什么。
我的问题:有没有办法从scikit中获取模型的基础公式,而不是在我的svm函数中将其作为“黑盒子”。
import numpy as np
from numpy import *
import pandas as pd
from sklearn import tree, svm, linear_model, metrics, preprocessing
import datetime
from sklearn.model_selection import KFold, cross_val_score, ShuffleSplit, GridSearchCV
from time import gmtime, strftime
#database openen en voorbereiden
file = "/home/wouter/scikit/DB_SCIKIT.csv"
DB = pd.read_csv(file, sep=";", header=0, decimal= ',').as_matrix()
DBT = DB
print "Vorm van de DB: ", DB.shape
target = []
for i in range(len(DB[:,-1])):
target.append(DB[i,-1])
DB = delete(DB,s_[-1],1) #Laatste kolom verwijderen
AantalOutcome = target.count(1)
print "Aantal outcome:", AantalOutcome
print "Aantal patienten:", len(target)
A = DB
b = target
print len(DBT)
svc=svm.SVC(kernel='linear', cache_size=500, probability=True)
indices = np.random.permutation(len(DBT))
rs = ShuffleSplit(n_splits=5, test_size=.15, random_state=None)
scores = cross_val_score(svc, A, b, cv=rs)
A = ("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2))
print A
X_train = DBT[indices[:-302]]
y_train = []
for i in range(len(X_train[:,-1])):
y_train.append(X_train[i,-1])
X_train = delete(X_train,s_[-1],1) #Laatste kolom verwijderen
X_test = DBT[indices[-302:]]
y_test = []
for i in range(len(X_test[:,-1])):
y_test.append(X_test[i,-1])
X_test = delete(X_test,s_[-1],1) #Laatste kolom verwijderen
model = svc.fit(X_train,y_train)
print model
uitkomst = model.score(X_test, y_test)
print uitkomst
voorspel = model.predict(X_test)
print voorspel
我得到的输出:
Vorm van de DB: (2011, 101)
Aantal outcome: 128
Aantal patienten: 2011
2011
Accuracy: 0.94 (+/- 0.01)
SVC(C=1.0, cache_size=500, class_weight=None, coef0=0.0,
decision_function_shape='ovr', degree=3, gamma='auto', kernel='linear',
max_iter=-1, probability=True, random_state=None, shrinking=True,
tol=0.001, verbose=False)
0.927152317881
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
答案 0 :(得分:0)
你可以先做一些特征选择〜 然后根据处理过的数据使用scikit~
http://scikit-learn.org/stable/modules/feature_selection.html#
(特征选择有3种方式:过滤器;包装器;嵌入)