如何随机选择2个实例(2行)进行测试&留在像这样的样本数据集中进行培训。
dog.data
22.0,4566.0,56.4,89.3,Dog-fifota
81.0,3434.0,34.4,67.3,Dog-listem
30.0,8944.0,23.4,45.3,Dog-biker
55.0,3455.0,78.5,11.3,Dog-listem
41.4,3345.0,45.3,34.1,Dog-fifota
答案 0 :(得分:1)
尝试(在R中)
indx <- sample(nrow(dog.data), 2)
test <- dog.data[indx, ]
train <- dog.data[-indx, ]
修改强>
如果你想把它作为一个函数,这样的东西就可以了:
spltfunc <- function(x){
indx <- sample(nrow(x), 2)
test <- x[indx, ]
train <- x[-indx, ]
list2env(list(test = test, train = train), .GlobalEnv)
}
测试
set.seed(123) # setting the random seed so you can reproduce results
spltfunc(dog.data)
# <environment: R_GlobalEnv>
test
# V1 V2 V3 V4 V5
# 2 81 3434 34.4 67.3 Dog-listem
# 4 55 3455 78.5 11.3 Dog-listem
train
# V1 V2 V3 V4 V5
# 1 22.0 4566 56.4 89.3 Dog-fifota
# 3 30.0 8944 23.4 45.3 Dog-biker
# 5 41.4 3345 45.3 34.1 Dog-fifota
答案 1 :(得分:0)
在随机样本和其他样本之间拆分数据集的一种简单方法是对整个数据集进行混洗,然后进行切片。您将获得两个随机排序的项目列表。
这是一个Python函数:
import random
def split_test_training(data, num_test_rows):
data_copy = list(data) # shallow copy
random.shuffle(data) # reorder the copied list in-place
testing = data[:num_test_rows] # slice the first num_test_rows for testing
training = data[num_test_rows:] # slice the rest for training
return testing, training
如果您不介意传入的列表被该功能改组,您可以跳过浅层副本。