Question

我有一个包含两列的数据集，如下所示，其中第1列，timestamp是时间的特定值，Column.10给出了该时间实例的总功耗。此数据共有81502个实例。

我正在使用e1071软件包对R中的这些数据进行支持向量回归，以预测未来的电力使用情况。代码如下。我首先将数据集划分为训练和测试数据。然后使用训练数据使用svm函数对数据建模，然后预测测试集的功率使用情况。

    library(e1071)
    attach(data.csv)
    index <- 1:nrow(data.csv)
    testindex <- sample(index,trunc(length(index)/3))
    testset <- na.omit(data.csv[testindex, ])
    trainingset <- na.omit(data.csv[-testindex, ])
    model <- svm(Column.10 ~ timestamp, data=trainingset)
    prediction <- predict(model, testset[,-2])
    tab <- table(pred = prediction, true = testset[,2])

但是，当我尝试从预测中制作混淆矩阵时，我收到错误：

    Error in table(pred = prediction, true = testset[, 2]) : all arguments must have the same length

所以我试图找到两个参数的长度并发现

    the length(prediction) to be 81502
    and  the length(testset[,2]) to be 27167

由于我仅针对测试集进行了预测，因此我不知道如何对81502值进行预测。预测和测试集的总值没有不同？如果仅为测试集提供，那么整个数据集的功率值如何得到预测？

Answer 1

更改

prediction <- predict(model, testset[,-2])

in

prediction <- predict(model, testset)

但是，在进行回归时不应该使用table，而是使用MSE。

SVM中预测模型和测试集数据的元组数量不同

1 个答案: