具有R条件的Cumsum

时间:2018-04-08 14:21:15

标签: r dplyr data.table cumsum

set.seed(123)
df <- data.frame(loc.id = rep(1:3,each = 3*5), 
             year = rep(rep(1981:1983, each = 5), times = 3), 
             week = rep(rep(20:24, times = 3), times = 3),
             cumsum.val = runif(min  = -2, max = 4, 5*3*3))

数据包含3 locations X 3 years X 5 weeks和一个名为cumsum.val的值。对于每个地点和年份,我想找到cumsum.val > 1的周数。然后,如果连续发生cumsum.val > 1两周,则选择第一周。一个例子

  test <- df[df$loc.id == 1 & df$year == 1981,]
  test$cumsum.test <- test$cumsum.val > 1 # weeks where cumsum.val > 1
  head(test)
    loc.id year   week cumsum.val cumsum.test
 1      1 1981   20 -0.2745349       FALSE
 2      1 1981   21  2.7298308        TRUE
 3      1 1981   22  0.4538615       FALSE
 4      1 1981   23  3.2981044        TRUE
 5      1 1981   24  3.6428037        TRUE

现在选择TRUE连续两次出现的第一周,在上述情况下为23周(因为周2324都是TRUE)。

如何为df实现此功能。情况可能是cumusm.val > 1连续出现两周。在这种情况下,只需选择cumsum.val > 1

的第一周

3 个答案:

答案 0 :(得分:1)

基于dplyr的解决方案可以解决问题。请注意,cumsum.test已计算为numeric,因此除了default之外,laglead的{​​{1}}值可以使用。

0/1

答案 1 :(得分:0)

set.seed(123)
df <- data.frame(loc.id = rep(1:3,each = 3*5), 
                 year = rep(rep(1981:1983, each = 5), times = 3), 
                 week = rep(rep(20:24, times = 3), times = 3),
                 cumsum.val = runif(min  = -2, max = 4, 5*3*3))

View(df)
b <- unique(df$loc.id)
data <- data.frame()
for(i in seq_along(b)){
  check=0
  for(j in 1:length(df$loc.id)){
    if(df$cumsum.val[j]>1 && df$loc.id[j]==b[i]){
      check=check+1
    }
    else if(df$loc.id[j]==b[i]){
      check=0
    }
    if(check>=2){
      data1 <- data.frame(week1=df$week[j-1],idd=df$loc.id[j])
      data <- rbind(data,data1)
    }
  } 
}

答案 2 :(得分:0)

data.table方法:

require(data.table) # load package
setDT(df) # Convert to data.table
df[, cumsum.test := cumsum.val > 1] # create new variable

# Find consecutive values, check they are indeed cumsum.val > 1, and return the first row of them:
df[c(diff(cumsum.test), NA) == 0 & cumsum.test == TRUE, .SD[1, ]]