按多个类别

时间:2016-12-21 02:53:28

标签: r dplyr rounding

为什么SE_daily的值错了?我期望它舍入到最接近的整数(虽然我想要一个小数),而十进制答案是完全错误的。我错过了什么?

csv<-csv%>%group_by(id_num)%>%group_by(Month)%>%group_by(Day)%>%mutate(SE_daily=mean(SelfEsteem, na.rm=T))
head(csv[,c(1:5,28,181)])
> head(csv[,c(1:5,28,181)])
Source: local data frame [6 x 7]
Groups: Day [3]

    X.1     X id_num Month   Day SelfEsteem SE_daily
  <int> <int>  <int> <int> <int>      <int>    <dbl>
1     1     1     29     2    19          4 3.457944 #mean(4,4,3)= 4, expected answer= 3.66666666667
2     2     2     29     2    19          4 3.457944
3     3     3     29     2    19          3 3.457944
4     4     4     29     2    20          4 3.424242 #expected answer= 4
5     5     5     29     2    21          4 3.318182 #expected answer=4
6     6     6     29     2    21          4 3.318182

csv输出头:

structure(list(X.1 = 1:6, X = 1:6,
  id_num = c(29L, 29L, 29L, 29L, 29L, 29L),
  Month = c(2L, 2L, 2L, 2L, 2L, 2L),
  Day = c(19L, 19L, 19L, 20L, 21L, 21L),
  SelfEsteem = c(4L, 4L, 3L, 4L, 4L, 4L),
  SE_daily = c(3.45794392523365, 3.45794392523365, 3.45794392523365,  3.42424242424242, 3.31818181818182, 3.31818181818182)), 
  .Names = c("X.1", "X", "id_num", "Month", "Day", "SelfEsteem", "SE_daily"),
  row.names = c(NA, -6L),
  class = "data.frame")

1 个答案:

答案 0 :(得分:2)

我得到了SE_daily的预期输出。通过管道group_by命令而不是将它们放在一个命令中,您可能正在查看共享公共id_num的多个MonthsDay(假设提供的数据结构只是整个数据集的一个子集)

library(dplyr)
csv %>%
  group_by(id_num, Month, Day) %>%
  mutate(SE_daily=mean(SelfEsteem, na.rm=TRUE))

输出

Source: local data frame [6 x 7]
Groups: id_num, Month, Day [3]

    X.1     X id_num Month   Day SelfEsteem SE_daily
  <int> <int>  <int> <int> <int>      <int>    <dbl>
1     1     1     29     2    19          4 3.666667
2     2     2     29     2    19          4 3.666667
3     3     3     29     2    19          3 3.666667
4     4     4     29     2    20          4 4.000000
5     5     5     29     2    21          4 4.000000
6     6     6     29     2    21          4 4.000000