在将数据帧拆分为R中较小的数据帧之后,如何对子样本的数据进行混洗

时间:2017-09-12 19:27:09

标签: r

我将大数据帧拆分为每个5000个记录的较小数据帧。但是在对每个子样本执行rbind操作之后,我想要对子样本数据进行混洗。当我试图改组数据时,它不会给我任何错误或改组数据。任何人都可以帮我重新调整数据

# splitting the dataframe into smaller dataframes
test_list <-split(New_data_zero, (seq(nrow(New_data_zero))-1) %/% 5000)

# performing the rbind to add data for all the data frames
for (i in 1: length(test_list)){
  test_list[[i]] <- rbind(test_list[[i]],New_data)
}

# Trying to shuffle the each subsample but not performing the operation 
for (i in 1: length(test_list)){
  test_list[[i]] <- test_list[[i]][sample(1:nrow(test_list[[i]])),]
}

1 个答案:

答案 0 :(得分:2)

试试这个

myfun <- function(df, numobs) {
             sdf <- split(df, rep(1:ceiling(nrow(df)/numobs), each=numobs))
             lapply(sdf, function(x) x[sample(nrow(x)),]) 
}

set.seed(1)
myfun(mtcars, 5)

输出

$`1`
                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2

$`2`
            mpg cyl  disp  hp drat   wt  qsec vs am gear carb
Merc 280   19.2   6 167.6 123 3.92 3.44 18.30  1  0    4    4
Duster 360 14.3   8 360.0 245 3.21 3.57 15.84  0  0    3    4
Merc 230   22.8   4 140.8  95 3.92 3.15 22.90  1  0    4    2
Valiant    18.1   6 225.0 105 2.76 3.46 20.22  1  0    3    1
Merc 240D  24.4   4 146.7  62 3.69 3.19 20.00  1  0    4    2

etc