我知道这可能很简单,但我无法解决。
我有以下df:
输入数据
df<-data.frame(id=c(1,2,3,3,3,4, 4, 4, 4, 4, 4), value = c(956, 986, 995, 995, 986, 700, 600, 995, 956, 1000, 986))
期望的结果
df<-data.frame(id=c("1","2","3","3","3","4", "4", "4", "4", "4", "4"), value = c("956", "986", "995", "995", "986", "700", "600", "995", "956", "1000", "986"), median = c("956", "986","995","995", "995", "700","650","700","828", "956", "971"))
这是为了计算每一行的中值(考虑到不同的ID&#39;)。在每一行中,将添加一个新值,然后计算新的中位数
输出数据
library(dplyr)
w = df %>%
group_by(id) %>%
mutate(median = median(value, na.rm =TRUE)) %>%
select (median)
df$median <- w[,2]
df<-data.frame(id=c("1","2","3","3","3","4", "4", "4", "4", "4", "4"), value = c("956", "986", "995", "995", "986", "700", "600", "995", "956", "1000", "986"), median = c("956", "986","995","995", "995", "971","971","971","971", "971", "971"))
答案 0 :(得分:3)
cumstats包中有cummedian
个函数可以执行相同操作。
library(cumstats)
ave(df$value, df$id, FUN = cummedian)
#[1] 956 986 995 995 995 700 650 700 828 956 971
也可以通过
翻译成dplyr
library(dplyr)
df %>%
group_by(id) %>%
mutate(median = cummedian(value))
# id value median
# <dbl> <dbl> <dbl>
# 1 1.00 956 956
# 2 2.00 986 986
# 3 3.00 995 995
# 4 3.00 995 995
# 5 3.00 986 995
# 6 4.00 700 700
# 7 4.00 600 650
# 8 4.00 995 700
# 9 4.00 956 828
#10 4.00 1000 956
#11 4.00 986 971
答案 1 :(得分:2)
您可以使用zoo::rollapplyr
来计算滚动中位数:
library(tidyverse);
library(zoo);
df %>%
group_by(id) %>%
mutate(
median = rollapplyr(value, seq_along(value), median))
## A tibble: 11 x 3
## Groups: id [4]
# id value median
# <dbl> <dbl> <dbl>
# 1 1. 956. 956.
# 2 2. 986. 986.
# 3 3. 995. 995.
# 4 3. 995. 995.
# 5 3. 986. 995.
# 6 4. 700. 700.
# 7 4. 600. 650.
# 8 4. 995. 700.
# 9 4. 956. 828.
#10 4. 1000. 956.
#11 4. 986. 971.
df <- data.frame(
id = c(1,2,3,3,3,4, 4, 4, 4, 4, 4),
value = c(956, 986, 995, 995, 986, 700, 600, 995, 956, 1000, 986))