我正在尝试使用R.中的KNN模型预测分类变量的值。
为此,我正在使用一个函数,以便我可以轻松改变数据集,观察百分比和k值。
当我将此函数应用于特定数据集时,我收到错误。
编辑:我对这个问题的重现性有些限制,但是,我正在添加库,以便明确我正在使用的软件包。
我使用的数据结构如下:
library(dplyr)
library(class)
library(neuralnet)
library(nnet)
library(lubridate)
> head(crypto_data)
time btc_price eth_price block_size difficulty estimated_btc_sent estimated_transaction_volume_usd hash_rate
1 2017-09-02 21:54:00 1.622181 1.710355 0.9502574 -1.258379 -0.05186039 0.4346130 -0.7265456
2 2017-09-02 22:29:00 1.738889 1.970749 0.5771003 -1.258379 -0.07004424 0.4110978 -1.0477347
3 2017-09-02 23:04:00 1.705891 1.938885 0.4726202 -1.258379 -0.10641195 0.3755673 -0.9406717
4 2017-09-02 23:39:00 1.775354 2.159321 0.4144439 -1.258379 -0.14277966 0.3348643 -0.8871402
5 2017-09-03 00:14:00 2.028195 2.572964 0.2132932 -1.258379 -0.10641195 0.4305168 -1.0477347
6 2017-09-03 00:49:00 2.097871 2.504085 0.0190859 -1.258379 -0.14277966 0.3756431 -1.1547978
miners_revenue_btc miners_revenue_usd minutes_between_blocks n_blocks_mined n_blocks_total n_btc_mined n_tx nextretarget
1 1.0287278 1.699011 -0.43408783 0.37556660 -2.016092 0.37464164 0.04072815 -2.22295
2 0.6856301 1.417137 -0.11622241 0.04004961 -2.015293 0.06154488 -0.12441993 -2.22295
3 0.7955973 1.507554 -0.22217755 0.15188860 -2.008898 0.15100110 -0.05626304 -2.22295
4 0.8395842 1.543490 -0.29923583 0.20780810 -2.005700 0.19572920 -0.10762521 -2.22295
5 0.6812315 1.519311 -0.06806098 0.04004961 -2.003302 0.06154488 -0.09733929 -2.22295
6 0.5580682 1.416853 -0.03916412 -0.07178939 -2.000904 -0.07263945 -0.19824250 -2.22295
total_btc_sent total_fees_btc totalbtc trade_volume_btc trade_volume_usd targetVar
1 -0.9319080 2.703601 -2.551107 0.2518994 0.5783353 buy
2 -0.9698475 2.632490 -2.551107 0.2518994 0.5783353 buy
3 -0.9698475 2.638365 -2.551107 0.2518994 0.5783353 buy
4 -1.0077870 2.594611 -2.551107 0.2518994 0.5783353 buy
5 -1.0077870 2.628309 -2.551107 0.1465798 0.4688573 hold
6 -1.0267568 2.568152 -2.551107 0.1465798 0.4688573 hold
功能是:
knn_predFunc <- function(inData, k, trainPct) {
trainP <- trainPct * .6
valP <- trainPct * .2
testP <- trainPct * .2
#SplitData
trainObs <- sample(nrow(inData), trainP * nrow(inData), replace = FALSE)
valObs <- sample(nrow(inData), valP * nrow(inData), replace = FALSE)
testObs <- sample(nrow(inData), testP * nrow(inData), replace = FALSE)
# Create the training/va/test datasets
trainDS <- inData[trainObs,]
valDS <- inData[valObs,]
testDS <- inData[testObs,]
# Separate the labels
train_labels <- trainDS[,"targetVar"]
# KNN
knn_crypto_val_pred <- knn(trainDS, valDS, train_labels, k = k)
knn_crypto_test_pred <- knn(trainDS, testDS, train_labels, k = k)
}
当我致电knn_pred_func(crypto_data, 3, 1)
时,我收到以下错误 -
knn错误(trainDS,valDS,train_labels,k = k):NA / NaN / Inf in foreign function call(arg 6)另外:警告信息:1:In knn(trainDS,valDS,train_labels,k = k):引入的NA 强制2:在knn(trainDS,valDS,train_labels,k = k):NA 强制引入
这是什么意思,我该如何解决?我尝试了knn_pred_func
的几个变体,它们都会出现同样的错误。此外,最初我有一个单独的火车/ val /测试标签集,但我看了一个在线发布后只保留train_labels - 这不对吗?我不应该将标签送到相应数据集的每个knn
吗?
答案 0 :(得分:0)
我怀疑问题在于crypto_data
的日期时间列。您收到的错误消息表明knn()
无法处理您的输入数据框。请在此处查看类似问题的非常详细的答案:Error with knn function
除非时间是您的分类任务的重要功能,否则我建议删除并使用:
knn_pred_func(crypto_data[,-1], 3, 1)