机器学习,将训练模型应用于测试

时间:2017-05-10 20:16:35

标签: python machine-learning scikit-learn linear-regression

政治科学研究生是他的头脑(雄心勃勃,但他们说的垃圾)。基本上,我试图对政治科学目的的一组政治家进行吸引力预测。我遵循了guide

在提取了地标和生成的特征之后,我使用了我的学习集(CFD,400个带有评级的图像),模型预测了通过交叉验证将0.49(足够我的目的)与实际评级相关联的评级。这就是代码:

import numpy as np
from sklearn import decomposition
from sklearn import linear_model
features = np.loadtxt('C:\\Users\\bruker\\Desktop\\Data\\CFD_features.txt', delimiter=',')
ratings = np.loadtxt('C:\\Users\\bruker\\Desktop\\Data\\CFD_ratings.txt', delimiter=',')
predictions = np.zeros(ratings.size);

for i in range(0, 400):
    features_train = np.delete(features, i, 0)
    features_test = features[i, :]
    ratings_train = np.delete(ratings, i, 0)
    ratings_test = ratings[i]
    pca = decomposition.PCA(n_components=13)
    pca.fit(features_train)
    features_train = pca.transform(features_train)
    features_test = pca.transform(features_test)
    regr = linear_model.LinearRegression()
    regr.fit(features_train, ratings_train)
    predictions[i] = regr.predict(features_test)
    print 'number of models trained:', i+1

np.savetxt('C:\\Users\\bruker\\Desktop\\Data\\CFDN_cross_valid_predictions.txt', predictions, delimiter=',', fmt = '%.04f')

corr = np.corrcoef(predictions, ratings)[0, 1]
print corr

现在我有另一个feature.txt包含政治家的特征数据(142张图片),我没有收视率。我想要做的是使用由上述代码构建的训练集/模型为我的政治家生成预测的吸引力评级,但我完全不知道如何继续。该指南对此保持沉默,可能是因为它适用于了解Python的人:)。我花了很多时间试图找出修改/构建这些代码的方法来实现它,但是我缺乏Python /一般编码知识使得很难弄明白。

鉴于该网站上的重要智力和知识,我希望有人知道解决方案,并可以帮助我。为我的无知道歉,并提前感谢任何帮助。

1 个答案:

答案 0 :(得分:0)

没有for循环,它变得非常容易。

import numpy as np
from sklearn import decomposition
from sklearn import linear_model

features_train = np.loadtxt('C:\\Users\\bruker\\Desktop\\Data\\CFD_features.txt', delimiter=',')
ratings_train = np.loadtxt('C:\\Users\\bruker\\Desktop\\Data\\CFD_ratings.txt', delimiter=',')

pca = decomposition.PCA(n_components=13)
pca.fit(features_train)
features_train = pca.transform(features_train)
regr = linear_model.LinearRegression()
regr.fit(features_train, ratings_train)

features_test = np.loadtxt('C:\\Users\\bruker\\Desktop\\Data\\CFD_features_Test.txt', delimiter=',')

features_test = pca.transform(features_test)
predictions = regr.predict(features_test)