Question

我有一张桌子：

ID   Dates        Rates
1  2010-01-01       0
1  2010-01-02       0
1  2010-01-03       2
1  2010-01-04       2
1  2010-01-05       2
1  2010-01-06       1
1  2010-01-07       0
1  2010-01-08       0
1  2010-01-09       0
1  2010-01-10       0
2  2010-01-01       3
2  2010-01-02       3
2  2010-01-03       2

我想计算第三列名为＆＃34;中位数＆＃34;在Rstudio中显示每5个连续行的中值，该表应如下所示

ID   Dates       Rates   Median_Rates
1    2010-01-01   0        2
1    2010-01-02   0        2
1    2010-01-03   2        2
1    2010-01-04   2        2
1    2010-01-05   2        2
1    2010-01-06   1        0
1    2010-01-07   0        0
1    2010-01-08   0        0
1    2010-01-09   0        0
1    2010-01-10   0        0
2    2010-01-01   3        3
2    2010-01-02   3        3
2    2010-01-03   2        3

然后将其应用于数据集中的所有ID和超过100万行？

我想按组（ID）计算每个连续5行（例如此位置+/- 5行）的Rate的中值，并将其用作Median_Rates的值。< / p>

Answer 1

函数ave就是为了这个我从the accepted answer to this question借用了这个想法，将tapply更改为ave，将sum更改为median。

data$Median_Rates <- ave(data$Rates, (seq_along(data$Rates)-1) %/% 5, FUN = median)
data
#   ID      Dates Rates Median_Rates
#1   1 2010-01-01     0            2
#2   2 2010-01-02     0            2
#3   3 2010-01-03     2            2
#4   4 2010-01-04     2            2
#5   5 2010-01-05     2            2
#6   5 2010-01-06     1            0
#7   7 2010-01-07     0            0
#8   8 2010-01-08     0            0
#9   9 2010-01-09     0            0
#10 10 2010-01-10     0            0

数据

data <-
structure(list(ID = c(1L, 2L, 3L, 4L, 5L, 5L, 7L, 8L, 9L, 10L
), Dates = structure(1:10, .Label = c("2010-01-01", "2010-01-02", 
"2010-01-03", "2010-01-04", "2010-01-05", "2010-01-06", "2010-01-07", 
"2010-01-08", "2010-01-09", "2010-01-10"), class = "factor"), 
    Rates = c(0L, 0L, 2L, 2L, 2L, 1L, 0L, 0L, 0L, 0L)), .Names = c("ID", 
"Dates", "Rates"), class = "data.frame", row.names = c(NA, -10L
))

修改
使用新数据集时，所需的只是在ID调用中将列ave作为分组变量。
我将调用此新数据集data2。

data2$Median_Rates <- ave(data2$Rates, data2$ID, (seq_along(data2$Rates)-1) %/% 5, FUN = median) data2 # ID Dates Rates Median_Rates #1 1 2010-01-01 0 2 #2 1 2010-01-02 0 2 #3 1 2010-01-03 2 2 #4 1 2010-01-04 2 2 #5 1 2010-01-05 2 2 #6 1 2010-01-06 1 0 #7 1 2010-01-07 0 0 #8 1 2010-01-08 0 0 #9 1 2010-01-09 0 0 #10 1 2010-01-10 0 0 #11 2 2010-01-01 3 3 #12 2 2010-01-02 3 3 #13 2 2010-01-03 2 3

新数据

data2 <- structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L), Dates = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 1L, 2L, 3L), .Label = c("2010-01-01", "2010-01-02", "2010-01-03", "2010-01-04", "2010-01-05", "2010-01-06", "2010-01-07", "2010-01-08", "2010-01-09", "2010-01-10"), class = "factor"), Rates = c(0L, 0L, 2L, 2L, 2L, 1L, 0L, 0L, 0L, 0L, 3L, 3L, 2L)), .Names = c("ID", "Dates", "Rates"), class = "data.frame", row.names = c(NA, -13L))

Answer 2

使用dplyr转换为lubridate的基于Date的解决方案可以实现为：

library(dplyr)
library(lubridate)

df %>% mutate(Dates = ymd(Dates)) %>%
  group_by(ID) %>%
  arrange(Dates) %>%
  mutate(Group = (row_number()-1) %/% 5 ) %>%
  group_by(ID, Group) %>%
  mutate(Median_Rates = median(Rates)) %>%
  ungroup() %>%
  arrange(ID) %>%
  select(-Group) %>% as.data.frame()

#    ID      Dates Rates Median_Rates
# 1   1 2010-01-01     0            2
# 2   1 2010-01-02     0            2
# 3   1 2010-01-03     2            2
# 4   1 2010-01-04     2            2
# 5   1 2010-01-05     2            2
# 6   1 2010-01-06     1            0
# 7   1 2010-01-07     0            0
# 8   1 2010-01-08     0            0
# 9   1 2010-01-09     0            0
# 10  1 2010-01-10     0            0
# 11  2 2010-01-01     3            3
# 12  2 2010-01-02     3            3
# 13  2 2010-01-03     2            3

如何在r中添加一个中间值为5连续行的列？

2 个答案: