R - 序列计算向前和向后看

时间:2016-08-18 20:17:19

标签: r data.table seq

我有以下数据框:

id = c("A","A","A","A","A","A","B","B","B","B","B","B","C","C","C","C","C","C")
month = c(1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6)
amount = c(0,0,10,0,0,0,0,10,0,10,0,0,0,0,0,10,10,0)

df <- data.frame(id, month, amount)

我需要做的(通过ID)是: 计算(通过负数)零和非零“金额”行之间的月份差异,直到“金额”等于0为止。当发生这种情况时,时间= 0.然后,一旦“金额”在序列中超过零,计算(通过正数)将回顾并计算非零与历史零“金额”行之间的差异。

解决方案如下:

solution = c(-2,-1,0,1,2,3,-1,0,1,0,1,2,-3,-2,-1,0,0,1)

你可能会说,搜索这个多层次的问题非常困难。理想情况下,答案是使用data.table,因为我正在处理数百万行,但dplyr也适合我的需要。

任何帮助表示感谢。

S上。

2 个答案:

答案 0 :(得分:2)

library(data.table)
setDT(DT)

DT[, g := rleid(id, amount != 0)]
DT[, g_id := g - g[1L], by=id]
DT[, v :=  
  if (g_id == 0L) 
    -(.N:1)
  else if (g_id %% 2 == 0)
    1:.N
  else 
    0L
, by=.(id, g_id)]

all.equal(DT$v, solution) # TRUE

要了解它的工作原理:

    id month amount  g g_id  v
 1:  A     1      0  1    0 -2
 2:  A     2      0  1    0 -1
 3:  A     3     10  2    1  0
 4:  A     4      0  3    2  1
 5:  A     5      0  3    2  2
 6:  A     6      0  3    2  3
 7:  B     1      0  4    0 -1
 8:  B     2     10  5    1  0
 9:  B     3      0  6    2  1
10:  B     4     10  7    3  0
11:  B     5      0  8    4  1
12:  B     6      0  8    4  2
13:  C     1      0  9    0 -3
14:  C     2      0  9    0 -2
15:  C     3      0  9    0 -1
16:  C     4     10 10    1  0
17:  C     5     10 10    1  0
18:  C     6      0 11    2  1

您可以使用DT[, c("g", "g_id") := NULL]删除多余的列。

答案 1 :(得分:1)

使用tidyrdplyr

library(dplyr)
library(tidyr)

df_new <- df %>% 
    group_by(id) %>% 
    # identify non-zero instances
    mutate(temp = ifelse(amount != 0, month, NA)) %>% 
    # fill down first
    fill(temp, .direction = "down") %>% 
    # fill up after
    fill(temp, .direction = "up") %>% 
    # calculate difference
    mutate(solution = month - temp) %>% 
    # remove temp
    select(-temp)

结果

#        id month amount solution
#     <fctr> <dbl>  <dbl>    <dbl>
# 1       A     1      0       -2
# 2       A     2      0       -1
# 3       A     3     10        0
# 4       A     4      0        1
# 5       A     5      0        2
# 6       A     6      0        3
# 7       B     1      0       -1
# 8       B     2     10        0
# 9       B     3      0        1
# 10      B     4     10        0
# 11      B     5      0        1
# 12      B     6      0        2
# 13      C     1      0       -3
# 14      C     2      0       -2
# 15      C     3      0       -1
# 16      C     4     10        0
# 17      C     5     10        0
# 18      C     6      0        1