Question

使用dplyr进行组均值居中的有效/首选方法是什么，即取一个组的每个元素（mutate）并对其执行操作和摘要统计（summarize）为那个小组。以下是使用基数R {@ 1}}进行组平均集中的方法：

mtcars

Answer 1

你可以尝试

library(dplyr)
mtcars %>%
      add_rownames()%>% #if the rownames are needed as a column
      group_by(cyl) %>% 
      mutate(cent= mpg-mean(mpg))

Answer 2

上面的代码似乎使用全局平均值将mpg居中；如果我想以组内平均值为中心，即每个圆柱组水平的平均值不同，该怎么办。

> mtcars %>%
+   add_rownames()%>% #if the rownames are needed as a column
+   group_by(cyl) %>% 
+   mutate(cent= mpg-mean(mpg))%>%
+   dplyr ::select(cent)
Adding missing grouping variables: `cyl`
# A tibble: 32 x 2
# Groups:   cyl [3]
     cyl   cent
   <dbl>  <dbl>
 1     6  0.909
 2     6  0.909
 3     4  2.71 
 4     6  1.31 
 5     8 -1.39 
 6     6 -1.99 
 7     8 -5.79 
 8     4  4.31 
 9     4  2.71 
10     6 -0.891
# … with 22 more rows
Warning message:
Deprecated, use tibble::rownames_to_column() instead. 
> mtcars$mpg[1:5]-mean(mtcars$mpg)
[1]  0.909375  0.909375  2.709375  1.309375 -1.390625

Answer 3

您可以改用此方法（尽管显示的新变量的名称不同）：

mtcars %>%
  group_by(cyl) %>%
  mutate(gpcent = scale(mpg, scale = F))

dplyr：组均值居中（mutate + summarize）

3 个答案: