Question

我是R的新程序员，我正在写我的论文来训练神经网络。首先，我使用rminer进行数据挖掘，然后使用rnet进行训练。现在我不知道在训练集和验证集中使用哪个函数来划分数据集，因此k-fold交叉验证，以及在每个使用nnet之后。对不起我的英语不好。提前致谢

Answer 1

如果您不知道如何处理它，可以在R中获取有关新主题/包的帮助：

library(help=package.name)

这将为您提供语言中定义的所有功能和数据集的概述，每个功能和数据集都有一个简短的标题。确定了所需的功能后，可以查阅感兴趣的功能文档，如下所示：

?function.name

在文档中，还要注意See Also部分，该部分通常列出了与正在考虑的函数一起使用的函数。另外，做一些例子。您也可以使用

example(function.name)

用于演示函数的使用和使用它的常用习语。

最后，如果您很幸运，软件包作者可能已为该软件包编写了vignette。你可以在这样的包中搜索所有的插图：

vignette(package="package.name")

希望这会让您开始使用rminer和nnet个包。

Answer 2

可能已经太晚了，但是当我在寻找Q的答案时，我发现了这个问题。你可以使用这样的东西

    # Splitting in training, Cross-Validation and test datasets
        #The entire dataset has 100% of the observations. The training dataset will have 60%, the Cross-Validation (CV) will have 20% and the testing dataset will have 20%.                                                                                                                                
        train_ind <- sample(seq_len(nrow(DF.mergedPredModels)), size = floor(0.6 * nrow(DF.mergedPredModels)))
        trainDF.mergedPredModels <- DF.mergedPredModels[train_ind, ]

        # The CV and testing datasets' observations will be built from the observations from the initial dataset excepting the ones from the training dataset
        # Cross-Validation dataset
        # The CV's number of observations can be changed simply by changing "0.5" to a fraction of your choice but the CV and testing dataset's fractions must add up to 1.
        cvDF.mergedPredModels <- DF.mergedPredModels[-train_ind, ][sample(seq_len(nrow(DF.mergedPredModels[-train_ind, ])), size = floor(0.5 * nrow(DF.mergedPredModels[-train_ind, ]))),]

        # Testing dataset
        testDF.mergedPredModels <- DF.mergedPredModels[-train_ind, ][-sample(seq_len(nrow(DF.mergedPredModels[-train_ind, ])), size = floor(0.5 * nrow(DF.mergedPredModels[-train_ind, ]))),]

        #temporal data and other will be added after the predictions are made because I don't need the models to be built on the dates. Additionally, you can add these columns to the training, CV and testing datasets and plot the real values of your predicted parameter and the respective predicitons over your time variables (half-hour, hour, day, week, month, quarter, season, year, etc.).
        # aa = Explicitly specify the columns to be used in the temporal datasets
        aa <- c("date", "period", "publish_date", "quarter", "month", "Season")
        temporaltrainDF.mergedPredModels <- trainDF.mergedPredModels[, c(aa)]
        temporalcvDF.mergedPredModels <- cvDF.mergedPredModels[, c(aa)]
        temporaltestDF.mergedPredModels <- testDF.mergedPredModels[, c(aa)]

        # bb = Explicitly specify the columns to be used in the training, CV and testing datasets
        bb <- c("quarter", "month", "Season", "period", "temp.mean", "wind_speed.mean", "solar_radiation", "realValue")
        trainDF.mergedPredModels.Orig <- trainDF.mergedPredModels[, c(bb)]
        trainDF.mergedPredModels <- trainDF.mergedPredModels[, c(bb)]
        smalltrainDF.mergedPredModels.Orig <- trainDF.mergedPredModels.Orig[1:10,] #see if the models converge without errors
        cvDF.mergedPredModels <- cvDF.mergedPredModels[, c(bb)]
        testDF.mergedPredModels <- testDF.mergedPredModels[, c(bb)]
# /Splitting in training, Cross-Validation and test datasets

如何使用Rminer和nnet

2 个答案: