我想计算组内连续值之间的比率。使用diff
:
mdata <- data.frame(group = c("A","A","A","B","B","C","C"), x = c(2,3,5,6,3,7,6))
mdata$diff <- unlist(by(mdata$x, mdata$group, function(x){c(NA, diff(x))}))
mdata
group x diff
1 A 2 NA
2 A 3 1
3 A 5 2
4 B 6 NA
5 B 3 -3
6 C 7 NA
7 C 6 -1
是否有计算比率的等效函数?期望的输出将是:
group x ratio
1 A 2 NA
2 A 3 1.5000000
3 A 5 1.6666667
4 B 6 NA
5 B 3 0.5000000
6 C 7 NA
7 C 6 0.8571429
答案 0 :(得分:7)
尝试使用dplyr:
install.packages(dplyr)
require(dplyr)
mdata <- data.frame(group = c("A","A","A","B","B","C","C"), x = c(2,3,5,6,3,7,6))
mdata <- group_by(mdata, group)
mutate(mdata, ratio = x / lag(x))
# Source: local data frame [7 x 3]
# Groups: group
# group x ratio
# 1 A 2 NA
# 2 A 3 1.5000000
# 3 A 5 1.6666667
# 4 B 6 NA
# 5 B 3 0.5000000
# 6 C 7 NA
# 7 C 6 0.8571429
你的差异将简化为:
mutate(mdata, diff = x - lag(x))
# Source: local data frame [7 x 3]
# Groups: group
# group x diff
# 1 A 2 NA
# 2 A 3 1
# 3 A 5 2
# 4 B 6 NA
# 5 B 3 -3
# 6 C 7 NA
# 7 C 6 -1
答案 1 :(得分:3)
同样的想法,使用data.table
:
library(data.table)
dt = as.data.table(mdata)
dt[, ratio := x / lag(x), by = group]
dt
# group x ratio
#1: A 2 NA
#2: A 3 1.5000000
#3: A 5 1.6666667
#4: B 6 NA
#5: B 3 0.5000000
#6: C 7 NA
#7: C 6 0.8571429
答案 2 :(得分:2)
ave
的另一个选项:
transform(mdata,
ratio=ave(x, group, FUN=function(y) c(NA, tail(y, -1) / head(y, -1))))
答案 3 :(得分:1)
使用by
:
do.call(rbind, by(mdata, mdata$group, function(dat) {
dat$ratio <- dat$x / c(NA, head(dat$x, -1))
dat
}))
# group x ratio
# A.1 A 2 NA
# A.2 A 3 1.5000000
# A.3 A 5 1.6666667
# B.4 B 6 NA
# B.5 B 3 0.5000000
# C.6 C 7 NA
# C.7 C 6 0.8571429