提取行以在事件发生之前首次出现变量

时间:2018-09-03 14:25:08

标签: r filter grouping

尝试在数据帧中首次出现的变量提取到已在数据帧中选择的特定值之前。具体来说,head(df)的输出为:

date discharge     event event.isolation some.column
1/1/2016  7.782711         NA  NA             FALSE
1/2/2016  7.349389  -5.567748  none            TRUE
1/3/2016  7.053813  -4.021769  none            TRUE
1/4/2016  7.421568   5.213554  none            TRUE
1/5/2016  5.722443 -22.894418  none            TRUE
1/6/2016  5.497342  -3.933662  none            TRUE
1/7/2016  5.347890  -6.898281  none            TRUE
1/8/2016  7.983489   4.289382  none            TRUE
1/9/2016  8.488293  -19.28304  none            TRUE

我想在-22或更小的每个date之前找到第一个discharge值为7.7或更大的event。换句话说,我知道每个event都感兴趣。我想向后迭代搜索,以找到每个选定的discharge之前的第一个event值7.7或更大。

我基本上是在尝试将Extract rows for the first occurrence of a variable in a data frameSelect row prior to first occurrence of an event by group结合起来,但是很难。

所需的结果将是df[1, ],因为它包含我选择的第5行中的discharge之前的第一个event值(向后工作)超过7.7。 / p>

1 个答案:

答案 0 :(得分:0)

这不是最优雅的解决方案,但适用于示例。

这首先定义了外观间隔(每个event < -22一个间隔)。然后寻找discharge > 7.7

的第一个匹配项

在此示例中,我假设您不想在event < -22discharge > 7.7处查找行,即使这是自上次事件以来discharge > 7.7的第一次出现

df <- read.csv(text = 'date discharge     event event.isolation some.column
1 1/1/2016  7.782711         NA  <NA>           FALSE
 2 1/2/2016  7.349389  -5.567748  none            TRUE
 3 1/3/2016  7.053813  -4.021769  none            TRUE
 4 1/4/2016  7.421568   5.213554  none            TRUE
 5 1/5/2016  5.722443 -22.894418  none            TRUE
 6 1/6/2016  5.497342  -3.933662  none            TRUE
 7 1/7/2016  5.347890  -6.898281  none            TRUE
 8 1/8/2016  7.983489   4.289382  none            TRUE',sep="")

## look which rows have a value for event < 22 and also include row 0 to define the first interval to look
 d <- c(0,which(df$event < -22))

## Each interval is defined as d[i] to d[i+1], where intervals are skipped where these are equal (because then you would return rows where both event < -22 and discharge > 7.7
new.df <- NULL
 for(i in 1:(length(d)-1)) {
  if(d[i+1] > (d[i] + 1)) {
   ## this will look only in the interval and return the first row for which the condition discharge>7.7 is TRUE
   new.df <- subset(df[(d[i]+1):(d[i+1]-1),], discharge>7.7)[1,]
  }
}