我有这种数据框
year <- c(2001, 2001, 2001, 2006, 2006, 2006, 2007, 2007, 2007)
group <- c("a", "b", "c", "a", "b", "c", "a", "b", "c")
value <- c(10, 50, 100, 20, 5, 200, 25, 50, 250)
mydf <- data.frame(year, group, value)
我想计算2006年和2007年相对于2001年的价值差异和比例变化。我理解如何用data.table计算第一组的差异,如
require(data.table)
mydf <- data.table(mydf)
mydf[, D.value:=c(NA, diff(value)), by=group]
mydf[, PD.value:=c(NA, diff(value)/value[-.N]), by=group]
mydf <- data.frame(mydf)
或者如何按照here解释的时间序列计算相对于开始日期的差异。但我似乎无法理解如何计算相对于基准年的价值差异。任何帮助将不胜感激。
答案 0 :(得分:4)
mydf[, diffs := value - value[year == 2001], by = group]
mydf[, propdiffs := diffs / value[year == 2001], by = group]
# year group value diffs propdiffs
#1: 2001 a 10 0 0.0
#2: 2001 b 50 0 0.0
#3: 2001 c 100 0 0.0
#4: 2006 a 20 10 1.0
#5: 2006 b 5 -45 -0.9
#6: 2006 c 200 100 1.0
#7: 2007 a 25 15 1.5
#8: 2007 b 50 0 0.0
#9: 2007 c 250 150 1.5
答案 1 :(得分:1)
我知道有人要求data.table,但是这里是dplyr的一种方式
mydf%>% group_by(group) %>%
mutate(diffs = value - value[1],
propdiffs = diffs/ value[1]) %>%
arrange(group,year) ## not necessary, but makes it easier to understand the result
# A tibble: 9 x 5
# Groups: group [3]
year group value diffs propdiffs
<dbl> <fct> <dbl> <dbl> <dbl>
1 2001 a 10 0 0
2 2006 a 20 10 1
3 2007 a 25 15 1.5
4 2001 b 50 0 0
5 2006 b 5 -45 -0.9
6 2007 b 50 0 0
7 2001 c 100 0 0
8 2006 c 200 100 1
9 2007 c 250 150 1.5