Question

使用虹膜数据框，我可以很容易地通过以下操作提取前n = 100条记录：

m_data<-iris
m_data[1:100,]

但我也有兴趣根据对物种的精确划分获得前100条记录。暂时假设前100条记录都是相同的物种-我想根据不同的物种使用“首次采样”提取数据。

欢迎提出任何建议。谢谢。

Answer 1

这是另一种选择：

do.call(rbind, lapply(split(iris, iris$Species), head, 100))

这将从iris到Species的前100条记录中提取

您可以使用by代替lapply

do.call(rbind, by(iris, iris$Species, head, 100))

Answer 2

您也可以使用dplyr进行此操作，这里从每个物种中选择前10个：

library(dplyr)
iris %>%
  group_by(Species) %>%
  filter(row_number() <= 10) # or slice(1:10)
#> # A tibble: 30 x 5
#> # Groups:   Species [3]
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
#>  1          5.1         3.5          1.4         0.2 setosa 
#>  2          4.9         3            1.4         0.2 setosa 
#>  3          4.7         3.2          1.3         0.2 setosa 
#>  4          4.6         3.1          1.5         0.2 setosa 
#>  5          5           3.6          1.4         0.2 setosa 
#>  6          5.4         3.9          1.7         0.4 setosa 
#>  7          4.6         3.4          1.4         0.3 setosa 
#>  8          5           3.4          1.5         0.2 setosa 
#>  9          4.4         2.9          1.4         0.2 setosa 
#> 10          4.9         3.1          1.5         0.1 setosa 
#> # ... with 20 more rows

由reprex package（v0.2.0）于2018-08-13创建。

R-选择头条记录但分组

2 个答案: