我有以下数据框:
id = c("A","A","A","A","A","A","B","B","B","B","B","B","C","C","C","C","C","C")
month = c(1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6)
amount = c(0,0,10,0,0,0,0,10,0,10,0,0,0,0,0,10,10,0)
df <- data.frame(id, month, amount)
我需要做的(通过ID)是: 计算(通过负数)零和非零“金额”行之间的月份差异,直到“金额”等于0为止。当发生这种情况时,时间= 0.然后,一旦“金额”在序列中超过零,计算(通过正数)将回顾并计算非零与历史零“金额”行之间的差异。
解决方案如下:
solution = c(-2,-1,0,1,2,3,-1,0,1,0,1,2,-3,-2,-1,0,0,1)
你可能会说,搜索这个多层次的问题非常困难。理想情况下,答案是使用data.table,因为我正在处理数百万行,但dplyr也适合我的需要。
任何帮助表示感谢。
S上。
答案 0 :(得分:2)
library(data.table)
setDT(DT)
DT[, g := rleid(id, amount != 0)]
DT[, g_id := g - g[1L], by=id]
DT[, v :=
if (g_id == 0L)
-(.N:1)
else if (g_id %% 2 == 0)
1:.N
else
0L
, by=.(id, g_id)]
all.equal(DT$v, solution) # TRUE
要了解它的工作原理:
id month amount g g_id v
1: A 1 0 1 0 -2
2: A 2 0 1 0 -1
3: A 3 10 2 1 0
4: A 4 0 3 2 1
5: A 5 0 3 2 2
6: A 6 0 3 2 3
7: B 1 0 4 0 -1
8: B 2 10 5 1 0
9: B 3 0 6 2 1
10: B 4 10 7 3 0
11: B 5 0 8 4 1
12: B 6 0 8 4 2
13: C 1 0 9 0 -3
14: C 2 0 9 0 -2
15: C 3 0 9 0 -1
16: C 4 10 10 1 0
17: C 5 10 10 1 0
18: C 6 0 11 2 1
您可以使用DT[, c("g", "g_id") := NULL]
删除多余的列。
答案 1 :(得分:1)
使用tidyr
和dplyr
library(dplyr)
library(tidyr)
df_new <- df %>%
group_by(id) %>%
# identify non-zero instances
mutate(temp = ifelse(amount != 0, month, NA)) %>%
# fill down first
fill(temp, .direction = "down") %>%
# fill up after
fill(temp, .direction = "up") %>%
# calculate difference
mutate(solution = month - temp) %>%
# remove temp
select(-temp)
结果
# id month amount solution
# <fctr> <dbl> <dbl> <dbl>
# 1 A 1 0 -2
# 2 A 2 0 -1
# 3 A 3 10 0
# 4 A 4 0 1
# 5 A 5 0 2
# 6 A 6 0 3
# 7 B 1 0 -1
# 8 B 2 10 0
# 9 B 3 0 1
# 10 B 4 10 0
# 11 B 5 0 1
# 12 B 6 0 2
# 13 C 1 0 -3
# 14 C 2 0 -2
# 15 C 3 0 -1
# 16 C 4 10 0
# 17 C 5 10 0
# 18 C 6 0 1