分组数据的移动平均值

时间:2018-11-10 11:58:06

标签: r

我想计算我的数据集的移动平均值,该数据由代表分组患者索引的列和代表血液循环分子的某些测量值的第二列组成。根据对目标分子的连续测量将患者分组。

此外,我想绘制输出,其中将每个组的测量值与患者组号相对应。

有人可以帮我吗?我尝试编写此分析代码,但不确定自己是否做得很好。

  SURG_DATE VES_2A Index
 21/05/2013    1     1
 10/06/2013    1     1
 06/01/2014    1     1
 29/01/2014    0     1
 11/03/2014    3     2
 05/04/2014    1     2
 06/04/2014    1     2
 14/05/2014    1     2
 28/05/2014    3     3
 02/09/2014    2     3
 16/09/2014    2     3
 17/09/2014    0     3
 21/10/2014    2     5
 05/12/2014    0     5
 19/12/2014    2     5
 11/01/2015    1     5
 15/01/2015    1     6
 17/01/2015    2     6
 24/01/2015    1     6
 19/02/2015    1     6

我尝试的代码:

tapply(test$VES_2A, 
       test$Index, 
       function(x) rollmean(x, 12, na.pad=TRUE))

提前谢谢

2 个答案:

答案 0 :(得分:2)

有点模棱两可,但是我想你想要这个:

test <- cbind(time=rownames(test), test)  # first add a time variable

# then create a list with rolling mean for each id and time
ls1 <- lapply(seq_along(test$time), 
              function(x) cbind(time=x,  # time variable
                                with(test[test$time %in% 1:x, ], 
                                     aggregate(list(VES_2A=VES_2A), 
                                               list(Index=Index), mean))  # rolling mean
                                ))

tot <- transform(t(sapply(ls1, colMeans)), Index="total")  # occasionally add a total column

long <- rbind(do.call(rbind, ls1), tot)  # bind all rows together into long format data frame
wide <- reshape2::dcast(long, time ~ Index)  # reshape to wide w/ e.g. reshape2::dcast()
rm(ls1, tot)  # clean up

屈服

> wide
   time    1        2        3        5        6    total
1     1 1.00       NA       NA       NA       NA 1.000000
2     2 1.00       NA       NA       NA       NA 1.000000
3     3 1.00       NA       NA       NA       NA 1.000000
4     4 0.75       NA       NA       NA       NA 0.750000
5     5 0.75 3.000000       NA       NA       NA 1.875000
6     6 0.75 2.000000       NA       NA       NA 1.375000
7     7 0.75 1.666667       NA       NA       NA 1.208333
8     8 0.75 1.500000       NA       NA       NA 1.125000
9     9 0.75 1.500000 3.000000       NA       NA 1.750000
10   10 0.75 1.500000 2.500000       NA       NA 1.583333
11   11 0.75 1.500000 2.333333       NA       NA 1.527778
12   12 0.75 1.500000 1.750000       NA       NA 1.333333
13   13 0.75 1.500000 1.750000 2.000000       NA 1.500000
14   14 0.75 1.500000 1.750000 1.000000       NA 1.250000
15   15 0.75 1.500000 1.750000 1.333333       NA 1.333333
16   16 0.75 1.500000 1.750000 1.250000       NA 1.312500
17   17 0.75 1.500000 1.750000 1.250000 1.000000 1.250000
18   18 0.75 1.500000 1.750000 1.250000 1.500000 1.350000
19   19 0.75 1.500000 1.750000 1.250000 1.333333 1.316667
20   20 0.75 1.500000 1.750000 1.250000 1.250000 1.300000

情节

library(ggplot2)
ggplot(long, aes(time, VES_2A, color=Index)) +
  geom_line()

enter image description here

告诉我您的想法,希望这就是您想要的。

数据

test <- structure(list(VES_2A = c(1L, 1L, 1L, 0L, 3L, 1L, 1L, 1L, 3L, 
                                  2L, 2L, 0L, 2L, 0L, 2L, 1L, 1L, 2L, 1L, 1L), Index = c(1L, 1L, 
                                                                                         1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 5L, 5L, 5L, 5L, 6L, 6L, 
                                                                                         6L, 6L)), class = "data.frame", row.names = c(NA, -20L))

答案 1 :(得分:1)

使用注释中可重复显示的数据,最后对每个Index值分别获取当前和先前两次观察的滚动平均值,并添加序列号。由于Index的每个值都占用4行,因此我们使用1:4。

这个问题尚不清楚,要绘制什么,但我们在单个面板上绘制每个索引的滚动平均值与序列。对于经典grapahics,如果要使用单独的面板,请将screen = 1替换为screen = colnames(wide)。要使ggplot2获得单独的面板,请省略facet=NULL

library(zoo)

roll <- function(x) rollmeanr(x, 3, fill = NA)
df3 <- transform(df, mean3 = ave(VES_2A, Index, FUN = roll), seq = 1:4)

wide <- na.omit(read.zoo(df3[-1], index = "seq", split = "Index"))

# classic graphics
plot(wide, screen = 1, type = "o", pch = colnames(wide))

# ggplot2 gtraphics
library(ggplot2)
autoplot(wide[-3], facet = NULL)

注意

Lines <- "  VES_2A Index
     1     1
     1     1
     1     1
     0     1
     3     2
     1     2
     1     2
     1     2
     3     3
     2     3
     2     3
     0     3
     2     5
     0     5
     2     5
     1     5
     1     6
     2     6
     1     6
     1     6"
df <- read.table(text = Lines, header = TRUE)