Question

我曾经从头开始用C＃编写机器学习代码。

现在，我将按照Jupyter的示例介绍sklearn的机器学习。在公共乳腺癌数据上做一些ML。

我很好奇通常会自己编写代码的幕后发生的事情，这很好，尽管另一方面却不清楚它是如何工作的。

因此，在代码的基本部分下面显示，我想知道在哪里可以设置一些学习迭代次数（保持训练400次）？

我想知道的另一件事是，启动种子在下面的代码中有很大的影响。虽然我不确定是否可能，但是能否针对模型应用优胜劣汰（使用不同的种子或不同的数据拆分或不同的其他设置），然后保存最佳模型的权重？（供重复使用）

# split in x and y dataset for validation and training
X = np.array(df.drop(['class'],1)) # not sure why this wasn't needed
y = np.array(df['class'])

#pretty cool python, impossible  line in C# 
X_train, X_test, y_train, y_test = cross_validation.train_test_split(X,y,test_size=0.2)

#specify testing options
seed = 8  # to get reproducable results
scoring = 'accuracy'

models=[]
models.append(('KNN',KNeighborsClassifier(n_neighbors=5)))
models.append(('SVM',SVC()))
models.append(('NB', GaussianNB()))
#evaluate eacht model
results=[]
names=[]
for name, model in models:
   kfold = model_selection.KFold(n_splits=10,random_state=seed)
   cv_results =  model_selection.cross_val_score(model,X_train,y_train,cv=kfold,scoring=scoring)
   names.append(name)
   msg = "%s: %f (%f)" %(name, cv_results.mean(), cv_results.std())
   print msg

# make a comparison prediction on the y dataset to check if it is right     
for name,model in models:
   model.fit(X_train,y_train)
   predictions=model.predict(X_test)
   print (name,accuracy_score(y_test,predictions))
   print(classification_report(y_test,predictions))

使用sklearn进行模型迭代并保存最佳模型

0 个答案: