航班数据集的KNN错误

时间:2017-05-24 18:39:28

标签: r dataframe random machine-learning

我正在尝试学习如何在R中进行KNN,并且正在使用nycflights13包中的航班数据集进行练习。运行以下代码时出现错误

  

'列车'和'班级'有不同的长度

我的代码:

library(nycflights13)
library(class)


deparr <- na.omit(flights[c(4, 7, 16)])

classframe <- deparr[3]

flights %>% ggvis(~dep_time, ~arr_time, fill = ~distance) %>% layer_points()

set.seed(1234)

ind <- sample(2, nrow(deparr), replace=TRUE, prob=c(0.67, 0.33))

flights.training <- deparr[ind==1, 1:2]
flights.test <- deparr[ind==2, 1:2]
flights.trainlabels <- deparr[ind==1, 3]
flights.testlabels <- deparr[ind==2, 3]

predictions <- knn(train = flights.training, test = flights.test, cl = flights.trainlabels[,1], k = 3)

1 个答案:

答案 0 :(得分:1)

以下是根据百分比划分列车和测试集的代码。如果你想以不同的方式拆分这两个子集,你应该可以从中工作,但它证明它有效。

deparr <- na.omit(flights[c(4, 7, 16)])
set.seed(1234)

# prepare to divide up the full dataset into two groups, 65%/35%
n <- nrow(deparr)
train_n <- round(0.65 * n)

# randomize our data 
deparr <- deparr[sample(n)]

# split up the actual data. We will use these as inputs to knn
flights.train <- deparr[1:train_n, ]
flights.test <- deparr[(train_n + 1):n, ]

# target variable, $distance, is in column 3, so exclude from train and test
predictions <- knn(train = flights.train[, 1:2], test = flights.test[, 1:2], cl = flights.train$distance, k = 10)

这样就可以了,结果我得到了:

> str(predictions)
Factor w/ 209 levels "80","94","96",..: 121 159 18 54 207 18 94 55 159 136 ...