Question

如果我使用以下内容向ìris数据集添加新行

iris <- as_tibble(iris)

> iris %>% 
    add_row(.before=0)

# A tibble: 151 × 5
    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
          <dbl>       <dbl>        <dbl>       <dbl>   <chr>
1            NA          NA           NA          NA    <NA> <--- Good!
2           5.1         3.5          1.4         0.2  setosa
3           4.9         3.0          1.4         0.2  setosa

有效。那么，为什么我不能在每个“子集”的顶部添加一个新行：

iris %>% 
 group_by(Species) %>% 
 add_row(.before=0)

Error: is.data.frame(df) is not TRUE

Answer 1

如果你想使用分组操作，你需要像他的评论中描述的JasonWang那样do，因为mutate或summarise等其他函数期望具有相同行数的结果作为分组数据框（在您的情况下，50）或一行（例如，在汇总时）。

正如您可能知道的那样，一般情况do可能会很慢，如果您无法以其他方式实现结果，则应该是最后的选择。您的任务非常简单，因为它只涉及在数据框中添加额外的行，这可以通过简单的索引来完成，例如，查看iris[NA, ]的输出。

你想要的是创建一个矢量

indices <- c(NA, 1:50, NA, 51:100, NA, 101:150)

（因为第一组是第1至第50行，第二组是51至100，第三组是101至150）。

结果是iris[indices, ]。

构建此向量的更一般方法是使用group_indices。

indices <- seq(nrow(iris)) %>% 
    split(group_indices(iris, Species)) %>% 
    map(~c(NA, .x)) %>%
    unlist

（map来自purrr，我假设您已加载，因为已使用tidyverse标记此内容。

Answer 2

更新的版本将使用 group_modify() 而不是 do()。

iris %>%
  as_tibble() %>%
  group_by(Species) %>% 
  group_modify(~ add_row(.x,.before=0))
#> # A tibble: 153 x 5
#> # Groups:   Species [3]
#>    Species Sepal.Length Sepal.Width Petal.Length Petal.Width
#>    <fct>          <dbl>       <dbl>        <dbl>       <dbl>
#>  1 setosa          NA          NA           NA          NA  
#>  2 setosa           5.1         3.5          1.4         0.2
#>  3 setosa           4.9         3            1.4         0.2

Answer 3

稍加改动，也可以这样做：

library(purrr)
library(tibble)

iris %>%
  group_split(Species) %>%
  map_dfr(~ .x %>%
            add_row(.before = 1))

# A tibble: 153 x 5
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
          <dbl>       <dbl>        <dbl>       <dbl> <fct>  
 1         NA          NA           NA          NA   NA     
 2          5.1         3.5          1.4         0.2 setosa 
 3          4.9         3            1.4         0.2 setosa 
 4          4.7         3.2          1.3         0.2 setosa 
 5          4.6         3.1          1.5         0.2 setosa 
 6          5           3.6          1.4         0.2 setosa 
 7          5.4         3.9          1.7         0.4 setosa 
 8          4.6         3.4          1.4         0.3 setosa 
 9          5           3.4          1.5         0.2 setosa 
10          4.4         2.9          1.4         0.2 setosa 
# ... with 143 more rows

这也可以用于分组数据框，但是，它有点冗长：

library(dplyr)

iris %>%
  group_by(Species) %>%
  summarise(Sepal.Length = c(NA, Sepal.Length), 
            Sepal.Width = c(NA, Sepal.Width), 
            Petal.Length = c(NA, Petal.Length),
            Petal.Width = c(NA, Petal.Width), 
            Species = c(NA, Species))

使用dplyr和add_row（）在每个组中添加行

3 个答案: