WEKA线性回归误差率太高

时间:2016-03-17 16:21:29

标签: excel weka linear-regression data-analysis

我正在尝试对一组数据(即书籍)执行线性回归,并使用所有属性预测评级。下面是我如何在Excel上格式化我的数据然后将文件传送到csv以将其上传到WEKA

Book    Author  Genre   Publisher   Year    Rating
1   1   5   1   2008    5
1   1   5   1   2008    5
1   1   5   1   2008    5
1   1   5   1   2008    5
1   1   5   1   2008    5
1   1   5   1   2008    5
1   1   5   1   2008    5
1   1   5   1   2008    5
1   1   5   1   2008    5
1   1   5   1   2008    5
1   1   5   1   2008    5
1   1   5   1   2008    5
1   1   5   1   2008    5
1   1   5   1   2008    5
1   1   5   1   2008    5
1   1   5   1   2008    5
1   1   5   1   2008    5
1   1   5   1   2008    5
1   1   5   1   2008    5
1   1   5   1   2008    5
1   1   5   1   2008    5
1   1   5   1   2008    5
1   1   5   1   2008    5

我为25本书的清单做了这个,总共有2431个实例。在WEKA上,我已经从“NumericToNominal”转换了前四个属性,然后选择了“线性回归”功能。这是我的结果:

Scheme:weka.classifiers.functions.LinearRegression -S 0 -R 1.0E-8
Relation:     Books WEKA-weka.filters.unsupervised.attribute.NumericToNominal-Rfirst-4
Instances:    2430
Attributes:   6
              Book
              Author
              Genre
              Publisher
              Year
              Rating
Test mode:10-fold cross-validation

=== Classifier model (full training set) ===


Linear Regression Model

Rating =

      0.2267 * Book=18,15,25,13,8,24,20,17,16,19,11,4,6,21,3,7,23,12,1,9,10,14,2 +
      0.4458 * Book=8,24,20,17,16,19,11,4,6,21,3,7,23,12,1,9,10,14,2 +
     -0.1527 * Book=24,20,17,16,19,11,4,6,21,3,7,23,12,1,9,10,14,2 +
     -0.314  * Book=20,17,16,19,11,4,6,21,3,7,23,12,1,9,10,14,2 +
      0.6751 * Book=19,11,4,6,21,3,7,23,12,1,9,10,14,2 +
      0.475  * Book=4,6,21,3,7,23,12,1,9,10,14,2 +
     -0.4018 * Book=3,7,23,12,1,9,10,14,2 +
      0.2522 * Book=7,23,12,1,9,10,14,2 +
     -0.4505 * Book=23,12,1,9,10,14,2 +
     -0.2583 * Book=12,1,9,10,14,2 +
      0.4949 * Book=10,14,2 +
     -0.3875 * Author=1,6,2,4,11,12,9,3,13,10,15 +
     -0.7318 * Author=6,2,4,11,12,9,3,13,10,15 +
      0.594  * Author=2,4,11,12,9,3,13,10,15 +
      0.379  * Author=4,11,12,9,3,13,10,15 +
      0.6818 * Author=11,12,9,3,13,10,15 +
      0.4396 * Author=12,9,3,13,10,15 +
      1.0057 * Author=9,3,13,10,15 +
     -1.4347 * Author=3,13,10,15 +
     -0.4547 * Author=13,10,15 +
      0.3638 * Author=10,15 +
     -0.4921 * Author=15 +
      0.2706 * Genre=7,5,2,1,6,4,8 +
     -0.4036 * Genre=5,2,1,6,4,8 +
     -0.7927 * Genre=2,1,6,4,8 +
     -0.4448 * Genre=1,6,4,8 +
      0.5731 * Genre=6,4,8 +
      0.5519 * Genre=8 +
      0.4517 * Publisher=21,9,8,2,20,10,3,22,5,11,1,18 +
     -0.4474 * Publisher=2,20,10,3,22,5,11,1,18 +
     -0.3018 * Publisher=10,3,22,5,11,1,18 +
      0.474  * Publisher=5,11,1,18 +
      0.6567 * Publisher=1,18 +
     -0.492  * Publisher=18 +
      3.5816

Time taken to build model: 0.28 seconds

=== Cross-validation ===
=== Summary ===

Correlation coefficient                  0.2415
Mean absolute error                      0.7883
Root mean squared error                  0.9772
Relative absolute error                 98.4114 %
Root relative squared error             97.0741 %
Total Number of Instances             2430     

不是为每个属性显示一个计算,而是显示多个计算,并且您可以看到错误率非常高。我提供导致此问题的数据的方式有什么问题吗?

0 个答案:

没有答案