min小于分组dplyr过滤中变量的当前值

时间:2018-03-11 23:16:03

标签: r dplyr

我有一个像这样的df

check <- read.table(text='material    previousUser    currentUser status  date    originFrame currentFrame
123 inventory   Dave    draft   2016-1  1/1/2016    1/1/2016
123 Dave    Carl    transfer    2016-2  2/1/2016    1/1/2016
123 Carl    customer    sent    2016-4  4/1/2016    1/1/2016
123 inventory   Dave    draft   2016-1  1/1/2016    2/1/2016
123 Dave    Carl    transfer    2016-2  2/1/2016    2/1/2016
123 Carl    customer    sent    2016-4  4/1/2016    2/1/2016
123 inventory   Dave    draft   2016-1  1/1/2016    3/1/2016
123 Dave    Carl    transfer    2016-2  2/1/2016    3/1/2016
123 Carl    customer    sent    2016-4  4/1/2016    3/1/2016
123 inventory   Dave    draft   2016-1  1/1/2016    4/1/2016
123 Dave    Carl    transfer    2016-2  2/1/2016    4/1/2016
123 Carl    customer    sent    2016-4  4/1/2016    4/1/2016
123 inventory   Dave    draft   2016-1  1/1/2016    5/1/2016
123 Dave    Carl    transfer    2016-2  2/1/2016    5/1/2016
123 Carl    customer    sent    2016-4  4/1/2016    5/1/2016
123 inventory   Dave    draft   2016-1  1/1/2016    1/1/2017
123 Dave    Carl    transfer    2016-2  2/1/2016    1/1/2017
123 Carl    customer    sent    2016-4  4/1/2016    1/1/2017
123 inventory   Dave    draft   2016-1  1/1/2016    2/1/2017
123 Dave    Carl    transfer    2016-2  2/1/2016    2/1/2017
123 Carl    customer    sent    2016-4  4/1/2016    2/1/2017
123 inventory   Dave    draft   2016-1  1/1/2016    3/1/2017
123 Dave    Carl    transfer    2016-2  2/1/2016    3/1/2017
123 Carl    customer    sent    2016-4  4/1/2016    3/1/2017
123 inventory   Dave    draft   2016-1  1/1/2016    4/1/2017
123 Dave    Carl    transfer    2016-2  2/1/2016    4/1/2017
123 Carl    customer    sent    2016-4  4/1/2016    4/1/2017
123 inventory   Dave    draft   2016-1  1/1/2016    5/1/2017
123 Dave    Carl    transfer    2016-2  2/1/2016    5/1/2017
123 Carl    customer    sent    2016-4  4/1/2016    5/1/2017
104 inventory   Dave    draft   2017-1  1/1/2017    1/1/2016
104 Dave    Carl    transfer    2017-2  2/1/2017    1/1/2016
104 Carl    customer    sent    2017-4  4/1/2017    1/1/2016
104 inventory   Dave    draft   2017-1  1/1/2017    2/1/2016
104 Dave    Carl    transfer    2017-2  2/1/2017    2/1/2016
104 Carl    customer    sent    2017-4  4/1/2017    2/1/2016
104 inventory   Dave    draft   2017-1  1/1/2017    3/1/2016
104 Dave    Carl    transfer    2017-2  2/1/2017    3/1/2016
104 Carl    customer    sent    2017-4  4/1/2017    3/1/2016
104 inventory   Dave    draft   2017-1  1/1/2017    4/1/2016
104 Dave    Carl    transfer    2017-2  2/1/2017    4/1/2016
104 Carl    customer    sent    2017-4  4/1/2017    4/1/2016
104 inventory   Dave    draft   2017-1  1/1/2017    5/1/2016
104 Dave    Carl    transfer    2017-2  2/1/2017    5/1/2016
104 Carl    customer    sent    2017-4  4/1/2017    5/1/2016
104 inventory   Dave    draft   2017-1  1/1/2017    1/1/2017
104 Dave    Carl    transfer    2017-2  2/1/2017    1/1/2017
104 Carl    customer    sent    2017-4  4/1/2017    1/1/2017
104 inventory   Dave    draft   2017-1  1/1/2017    2/1/2017
104 Dave    Carl    transfer    2017-2  2/1/2017    2/1/2017
104 Carl    customer    sent    2017-4  4/1/2017    2/1/2017
104 inventory   Dave    draft   2017-1  1/1/2017    3/1/2017
104 Dave    Carl    transfer    2017-2  2/1/2017    3/1/2017
104 Carl    customer    sent    2017-4  4/1/2017    3/1/2017
104 inventory   Dave    draft   2017-1  1/1/2017    4/1/2017
104 Dave    Carl    transfer    2017-2  2/1/2017    4/1/2017
104 Carl    customer    sent    2017-4  4/1/2017    4/1/2017
104 inventory   Dave    draft   2017-1  1/1/2017    5/1/2017
104 Dave    Carl    transfer    2017-2  2/1/2017    5/1/2017
104 Carl    customer    sent    2017-4  4/1/2017    5/1/2017', header=TRUE, stringsAsFactors = FALSE)
check[c('originFrame','currentFrame')] <- lapply(check[c('originFrame','currentFrame')], as.Date, format = '%m/%d/%Y')

我希望按照currentFrame和材质分组,对于其originFrame等于currentFrame的行,如果它不相等,则选择小于currentFrame的最大originFrame,如下所示:

material    previousUser    currentUser status  date    originFrame currentFrame
123 inventory   Dave    draft   2016-1  1/1/2016    1/1/2016
123 Dave    Carl    transfer    2016-2  2/1/2016    2/1/2016
123 Dave    Carl    transfer    2016-2  2/1/2016    3/1/2016
123 Carl    customer    sent    2016-4  4/1/2016    4/1/2016
123 Carl    customer    sent    2016-4  4/1/2016    5/1/2016
123 inventory   Dave    draft   2016-1  4/1/2016    1/1/2017
123 Dave    Carl    transfer    2016-2  4/1/2016    2/1/2017
123 Dave    Carl    transfer    2016-2  4/1/2016    3/1/2017
123 Carl    customer    sent    2016-4  4/1/2016    4/1/2017
123 Carl    customer    sent    2016-4  4/1/2016    5/1/2017
104 inventory   Dave    draft   2016-1  1/1/2017    1/1/2016
104 Dave    Carl    transfer    2016-2  1/1/2017    2/1/2016
104 Dave    Carl    transfer    2016-2  1/1/2017    3/1/2016
104 Carl    customer    sent    2016-4  1/1/2017    4/1/2016
104 Carl    customer    sent    2016-4  1/1/2017    5/1/2016
104 inventory   Dave    draft   2016-1  1/1/2017    1/1/2017
104 Dave    Carl    transfer    2016-2  2/1/2017    2/1/2017
104 Dave    Carl    transfer    2016-2  2/1/2017    3/1/2017
104 Carl    customer    sent    2016-4  4/1/2017    4/1/2017
104 Carl    customer    sent    2016-4  4/1/2017    5/1/2017

这有效,但没有考虑currentFrame的值,从而给我错误的结果

check <- as.data.frame(
  check %>% 
    group_by(currentFrame, material) %>% 
    filter(
      ifelse(
        currentFrame %in% originFrame,
        originFrame == currentFrame,
        ifelse(
          max(originFrame) > currentFame,
          originFrame == max(originFrame),
          originFrame == max(originFrame)
        )
      )
    )
)

但我似乎无法使用以下规则,即max必须低于currentFrame的值,使用以下内容返回错误的观察次数

check <- as.data.frame(
  check %>% 
    group_by(currentFrame, material) %>% 
    filter(
      ifelse(
        currentFrame %in% originFrame,
        originFrame == currentFrame,
        ifelse(
          max(originFrame) > currentFrame,
          originFrame == which.max(originFrame < currentFrame),
          originFrame == max(originFrame)
        )
      )
    )
)

编辑*应该提及,实际上datafame包含许多具有不同日期的材料,现在更新

Edit2 *好吧,对不起,希望更清楚,如果有人对我如何能更好地表达这个问题有任何反馈,我会很感激。

2 个答案:

答案 0 :(得分:1)

您的数据采用更易消费的格式:

check <- read.table(text='material    previousUser    currentUser status  date    originFrame currentFrame
123 inventory   Dave    draft   2016-1  1/1/2016    1/1/2016
123 Dave    Carl    transfer    2016-2  2/1/2016    1/1/2016
123 Carl    customer    sent    2016-4  4/1/2016    1/1/2016
123 inventory   Dave    draft   2016-1  1/1/2016    2/1/2016
123 Dave    Carl    transfer    2016-2  2/1/2016    2/1/2016
123 Carl    customer    sent    2016-4  4/1/2016    2/1/2016
123 inventory   Dave    draft   2016-1  1/1/2016    3/1/2016
123 Dave    Carl    transfer    2016-2  2/1/2016    3/1/2016
123 Carl    customer    sent    2016-4  4/1/2016    3/1/2016
123 inventory   Dave    draft   2016-1  1/1/2016    4/1/2016
123 Dave    Carl    transfer    2016-2  2/1/2016    4/1/2016
123 Carl    customer    sent    2016-4  4/1/2016    4/1/2016
123 inventory   Dave    draft   2016-1  1/1/2016    5/1/2016
123 Dave    Carl    transfer    2016-2  2/1/2016    5/1/2016
123 Carl    customer    sent    2016-4  4/1/2016    5/1/2016', header=TRUE, stringsAsFactors = FALSE)
check[c('originFrame','currentFrame')] <- lapply(check[c('originFrame','currentFrame')], as.Date, format = '%m/%d/%Y')

一种方法,继续dplyr

library(dplyr)
check %>%
  mutate(datediff = currentFrame - originFrame) %>%
  arrange(currentFrame, datediff)  %>%
  group_by(currentFrame) %>%
  filter(datediff >= 0) %>%
  slice(1) %>%
  ungroup() %>%
  select(-datediff)
# # A tibble: 5 × 7
#   material previousUser currentUser   status   date originFrame currentFrame
#      <int>        <chr>       <chr>    <chr>  <chr>      <date>       <date>
# 1      123    inventory        Dave    draft 2016-1  2016-01-01   2016-01-01
# 2      123         Dave        Carl transfer 2016-2  2016-02-01   2016-02-01
# 3      123         Dave        Carl transfer 2016-2  2016-02-01   2016-03-01
# 4      123         Carl    customer     sent 2016-4  2016-04-01   2016-04-01
# 5      123         Carl    customer     sent 2016-4  2016-04-01   2016-05-01

答案 1 :(得分:-1)

我明白了,

我最终做的是将数据框分成三个数据框,一个用于originFrame = CurrentFrame,originFrame&lt; currentFrame,originFrame&gt;设置currentFrame。然后,我从数据帧2中移除了数据帧1的所有内容,以及数据帧3中数据帧1和2中的所有内容,然后我从dataframe2获取了max originFrame,从dataframe3中获取了min originFrame。将它们绑定在一起后,我得到了我需要的东西。