我有一个看起来像这样的数据集:
df1 <- data.frame(id = c(rep("A1",4), rep("A2",4)),
time = rep(c(0,2:4), 2),
y1 = rnorm(8),
y2 = rnorm(8))
对于每个y
变量,我想计算自time==0
以来的变化。基本上,我想这样做:
calc_chage <- function(id, data){
#y1
y1_0 <- data$y1[which(data$time==0 & data$id==id)]
D2y1 <- data$y1[which(data$time==2 & data$id==id)] - y1_0
D3y1 <- data$y1[which(data$time==3 & data$id==id)] - y1_0
D4y1 <- data$y1[which(data$time==4 & data$id==id)] - y1_0
#y2
y2_0 <- data$y2[which(data$time==0 & data$id==id)]
D2y2 <- data$y2[which(data$time==2 & data$id==id)] - y2_0
D3y2 <- data$y2[which(data$time==3 & data$id==id)] - y2_0
D4y2 <- data$y2[which(data$time==4 & data$id==id)] - y2_0
#Output
out <- data.frame(id=id, delta=rep(2:4, 2),
outcome=c(rep("y1",3), rep("y2",3)),
change = c(D2y1, D3y1, D4y1,
D2y2, D3y2, D4y2))
}
library(purrr)
changes <- map(.x = unique(df1$id), .f = calc_chage, data=df1) %>%
map_df(bind_rows)
我的猜测是有一种更有效的方法。唉,我想不出来。建议?
答案 0 :(得分:2)
要计算自time == 0
以来的更改,您可以使用cumsum + diff
;由于汇总结果的长度不等于1,首先将其包装在列表中,然后 unfst ,并使用gather
将结果转换为长格式:
library(tidyverse)
df1 %>%
group_by(id) %>%
summarise_all(~ list(cumsum(diff(.)))) %>%
unnest() %>% rename(delta = time) %>%
gather(outcome, change, y1:y2) %>%
arrange(id) -> changes2
changes2
# A tibble: 12 x 4
# id delta outcome change
# <fctr> <dbl> <chr> <dbl>
# 1 A1 2 y1 2.2827244
# 2 A1 3 y1 2.2070326
# 3 A1 4 y1 1.9530212
# 4 A1 2 y2 -2.1263046
# 5 A1 3 y2 -0.5430784
# 6 A1 4 y2 -0.3109535
# 7 A2 2 y1 -1.8587070
# 8 A2 3 y1 -1.1399270
# 9 A2 4 y1 1.5667202
#10 A2 2 y2 -2.0047108
#11 A2 3 y2 -3.4414667
#12 A2 4 y2 -1.3662450
changes$delta <- as.numeric(changes$delta)
changes$outcome <- as.character(changes$outcome)
all.equal(as.data.frame(changes2), changes)
# [1] TRUE
答案 1 :(得分:1)
如果您想依赖基本的R
函数,我发现aggregate()
是其他解决方案的一个很好的替代方案:
res <- aggregate(x = df1$y2, by = list(df1$id), FUN = function(x) x-x[1],
simplify=T)[-1]
data.frame(df1, delta = c(t(res)))
# id time y1 y2 delta
# 1 A1 0 0.9176567 -0.70469232 0.0000000
# 2 A1 2 -0.8258515 0.18032808 0.8850204
# 3 A1 3 -0.8144515 -0.39995370 0.3047386
# 4 A1 4 1.5171310 -0.97107643 -0.2663841
# 5 A2 0 0.1900048 -0.01022439 0.0000000
# 6 A2 2 -0.7181630 0.35408157 0.3643060
# 7 A2 3 0.1379936 -0.34336329 -0.3331389
# 8 A2 4 0.4773945 1.38467064 1.3948950
答案 2 :(得分:0)
如果您在t = 0时拉出值,该怎么办? 可以进一步推广以获得更多y值。
例如:
library(dplyr)
t0 <- data %>%
filter(time == 0) %>%
mutate(t0_y1 = y1,
t0.y2 = y2) %>%
select(-time, -y1, -y2)
data <- data %>%
left_join(t0) %>%
mutate(change.y1 = y1 - t0_y1,
change.y2 = y2 - t0_y2)