我想计算我的数据集的移动平均值,该数据由代表分组患者索引的列和代表血液循环分子的某些测量值的第二列组成。根据对目标分子的连续测量将患者分组。
此外,我想绘制输出,其中将每个组的测量值与患者组号相对应。
有人可以帮我吗?我尝试编写此分析代码,但不确定自己是否做得很好。
SURG_DATE VES_2A Index
21/05/2013 1 1
10/06/2013 1 1
06/01/2014 1 1
29/01/2014 0 1
11/03/2014 3 2
05/04/2014 1 2
06/04/2014 1 2
14/05/2014 1 2
28/05/2014 3 3
02/09/2014 2 3
16/09/2014 2 3
17/09/2014 0 3
21/10/2014 2 5
05/12/2014 0 5
19/12/2014 2 5
11/01/2015 1 5
15/01/2015 1 6
17/01/2015 2 6
24/01/2015 1 6
19/02/2015 1 6
我尝试的代码:
tapply(test$VES_2A,
test$Index,
function(x) rollmean(x, 12, na.pad=TRUE))
提前谢谢
答案 0 :(得分:2)
有点模棱两可,但是我想你想要这个:
test <- cbind(time=rownames(test), test) # first add a time variable
# then create a list with rolling mean for each id and time
ls1 <- lapply(seq_along(test$time),
function(x) cbind(time=x, # time variable
with(test[test$time %in% 1:x, ],
aggregate(list(VES_2A=VES_2A),
list(Index=Index), mean)) # rolling mean
))
tot <- transform(t(sapply(ls1, colMeans)), Index="total") # occasionally add a total column
long <- rbind(do.call(rbind, ls1), tot) # bind all rows together into long format data frame
wide <- reshape2::dcast(long, time ~ Index) # reshape to wide w/ e.g. reshape2::dcast()
rm(ls1, tot) # clean up
屈服
> wide
time 1 2 3 5 6 total
1 1 1.00 NA NA NA NA 1.000000
2 2 1.00 NA NA NA NA 1.000000
3 3 1.00 NA NA NA NA 1.000000
4 4 0.75 NA NA NA NA 0.750000
5 5 0.75 3.000000 NA NA NA 1.875000
6 6 0.75 2.000000 NA NA NA 1.375000
7 7 0.75 1.666667 NA NA NA 1.208333
8 8 0.75 1.500000 NA NA NA 1.125000
9 9 0.75 1.500000 3.000000 NA NA 1.750000
10 10 0.75 1.500000 2.500000 NA NA 1.583333
11 11 0.75 1.500000 2.333333 NA NA 1.527778
12 12 0.75 1.500000 1.750000 NA NA 1.333333
13 13 0.75 1.500000 1.750000 2.000000 NA 1.500000
14 14 0.75 1.500000 1.750000 1.000000 NA 1.250000
15 15 0.75 1.500000 1.750000 1.333333 NA 1.333333
16 16 0.75 1.500000 1.750000 1.250000 NA 1.312500
17 17 0.75 1.500000 1.750000 1.250000 1.000000 1.250000
18 18 0.75 1.500000 1.750000 1.250000 1.500000 1.350000
19 19 0.75 1.500000 1.750000 1.250000 1.333333 1.316667
20 20 0.75 1.500000 1.750000 1.250000 1.250000 1.300000
情节
library(ggplot2)
ggplot(long, aes(time, VES_2A, color=Index)) +
geom_line()
告诉我您的想法,希望这就是您想要的。
数据
test <- structure(list(VES_2A = c(1L, 1L, 1L, 0L, 3L, 1L, 1L, 1L, 3L,
2L, 2L, 0L, 2L, 0L, 2L, 1L, 1L, 2L, 1L, 1L), Index = c(1L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 5L, 5L, 5L, 5L, 6L, 6L,
6L, 6L)), class = "data.frame", row.names = c(NA, -20L))
答案 1 :(得分:1)
使用注释中可重复显示的数据,最后对每个Index值分别获取当前和先前两次观察的滚动平均值,并添加序列号。由于Index的每个值都占用4行,因此我们使用1:4。
这个问题尚不清楚,要绘制什么,但我们在单个面板上绘制每个索引的滚动平均值与序列。对于经典grapahics,如果要使用单独的面板,请将screen = 1
替换为screen = colnames(wide)
。要使ggplot2获得单独的面板,请省略facet=NULL
。
library(zoo)
roll <- function(x) rollmeanr(x, 3, fill = NA)
df3 <- transform(df, mean3 = ave(VES_2A, Index, FUN = roll), seq = 1:4)
wide <- na.omit(read.zoo(df3[-1], index = "seq", split = "Index"))
# classic graphics
plot(wide, screen = 1, type = "o", pch = colnames(wide))
# ggplot2 gtraphics
library(ggplot2)
autoplot(wide[-3], facet = NULL)
Lines <- " VES_2A Index
1 1
1 1
1 1
0 1
3 2
1 2
1 2
1 2
3 3
2 3
2 3
0 3
2 5
0 5
2 5
1 5
1 6
2 6
1 6
1 6"
df <- read.table(text = Lines, header = TRUE)