按id计算每行的新中位数

时间:2018-04-02 08:57:06

标签: r group-by dplyr grouping median

我知道这可能很简单,但我无法解决。

我有以下df:

输入数据

df<-data.frame(id=c(1,2,3,3,3,4, 4, 4, 4, 4, 4), value = c(956, 986, 995, 995, 986, 700, 600, 995, 956, 1000, 986))

期望的结果

df<-data.frame(id=c("1","2","3","3","3","4", "4", "4", "4", "4", "4"), value = c("956", "986", "995", "995", "986", "700", "600", "995", "956", "1000", "986"), median = c("956", "986","995","995", "995", "700","650","700","828", "956", "971"))

这是为了计算每一行的中值(考虑到不同的ID&#39;)。在每一行中,将添加一个新值,然后计算新的中位数

输出数据

library(dplyr)
w = df %>%
group_by(id) %>%
mutate(median = median(value, na.rm =TRUE)) %>%
select (median)
df$median <- w[,2]


df<-data.frame(id=c("1","2","3","3","3","4", "4", "4", "4", "4", "4"), value = c("956", "986", "995", "995", "986", "700", "600", "995", "956", "1000", "986"), median = c("956", "986","995","995", "995", "971","971","971","971", "971", "971"))

2 个答案:

答案 0 :(得分:3)

cumstats包中有cummedian个函数可以执行相同操作。

library(cumstats)
ave(df$value, df$id, FUN = cummedian)

#[1] 956 986 995 995 995 700 650 700 828 956 971

也可以通过

翻译成dplyr
library(dplyr)
df %>%
  group_by(id) %>%
  mutate(median = cummedian(value))



#      id value median
#   <dbl> <dbl>  <dbl>
# 1  1.00   956    956
# 2  2.00   986    986
# 3  3.00   995    995
# 4  3.00   995    995
# 5  3.00   986    995
# 6  4.00   700    700
# 7  4.00   600    650
# 8  4.00   995    700
# 9  4.00   956    828
#10  4.00  1000    956
#11  4.00   986    971

答案 1 :(得分:2)

您可以使用zoo::rollapplyr来计算滚动中位数:

library(tidyverse);
library(zoo);
df %>%
    group_by(id) %>%
    mutate(
        median = rollapplyr(value, seq_along(value), median))
## A tibble: 11 x 3
## Groups:   id [4]
#      id value median
#   <dbl> <dbl>  <dbl>
# 1    1.  956.   956.
# 2    2.  986.   986.
# 3    3.  995.   995.
# 4    3.  995.   995.
# 5    3.  986.   995.
# 6    4.  700.   700.
# 7    4.  600.   650.
# 8    4.  995.   700.
# 9    4.  956.   828.
#10    4. 1000.   956.
#11    4.  986.   971.

样本数据

df <- data.frame(
    id = c(1,2,3,3,3,4, 4, 4, 4, 4, 4), 
    value = c(956, 986, 995, 995, 986, 700, 600, 995, 956, 1000, 986))