我有一个像这样的df
check <- read.table(text='material previousUser currentUser status date originFrame currentFrame
123 inventory Dave draft 2016-1 1/1/2016 1/1/2016
123 Dave Carl transfer 2016-2 2/1/2016 1/1/2016
123 Carl customer sent 2016-4 4/1/2016 1/1/2016
123 inventory Dave draft 2016-1 1/1/2016 2/1/2016
123 Dave Carl transfer 2016-2 2/1/2016 2/1/2016
123 Carl customer sent 2016-4 4/1/2016 2/1/2016
123 inventory Dave draft 2016-1 1/1/2016 3/1/2016
123 Dave Carl transfer 2016-2 2/1/2016 3/1/2016
123 Carl customer sent 2016-4 4/1/2016 3/1/2016
123 inventory Dave draft 2016-1 1/1/2016 4/1/2016
123 Dave Carl transfer 2016-2 2/1/2016 4/1/2016
123 Carl customer sent 2016-4 4/1/2016 4/1/2016
123 inventory Dave draft 2016-1 1/1/2016 5/1/2016
123 Dave Carl transfer 2016-2 2/1/2016 5/1/2016
123 Carl customer sent 2016-4 4/1/2016 5/1/2016
123 inventory Dave draft 2016-1 1/1/2016 1/1/2017
123 Dave Carl transfer 2016-2 2/1/2016 1/1/2017
123 Carl customer sent 2016-4 4/1/2016 1/1/2017
123 inventory Dave draft 2016-1 1/1/2016 2/1/2017
123 Dave Carl transfer 2016-2 2/1/2016 2/1/2017
123 Carl customer sent 2016-4 4/1/2016 2/1/2017
123 inventory Dave draft 2016-1 1/1/2016 3/1/2017
123 Dave Carl transfer 2016-2 2/1/2016 3/1/2017
123 Carl customer sent 2016-4 4/1/2016 3/1/2017
123 inventory Dave draft 2016-1 1/1/2016 4/1/2017
123 Dave Carl transfer 2016-2 2/1/2016 4/1/2017
123 Carl customer sent 2016-4 4/1/2016 4/1/2017
123 inventory Dave draft 2016-1 1/1/2016 5/1/2017
123 Dave Carl transfer 2016-2 2/1/2016 5/1/2017
123 Carl customer sent 2016-4 4/1/2016 5/1/2017
104 inventory Dave draft 2017-1 1/1/2017 1/1/2016
104 Dave Carl transfer 2017-2 2/1/2017 1/1/2016
104 Carl customer sent 2017-4 4/1/2017 1/1/2016
104 inventory Dave draft 2017-1 1/1/2017 2/1/2016
104 Dave Carl transfer 2017-2 2/1/2017 2/1/2016
104 Carl customer sent 2017-4 4/1/2017 2/1/2016
104 inventory Dave draft 2017-1 1/1/2017 3/1/2016
104 Dave Carl transfer 2017-2 2/1/2017 3/1/2016
104 Carl customer sent 2017-4 4/1/2017 3/1/2016
104 inventory Dave draft 2017-1 1/1/2017 4/1/2016
104 Dave Carl transfer 2017-2 2/1/2017 4/1/2016
104 Carl customer sent 2017-4 4/1/2017 4/1/2016
104 inventory Dave draft 2017-1 1/1/2017 5/1/2016
104 Dave Carl transfer 2017-2 2/1/2017 5/1/2016
104 Carl customer sent 2017-4 4/1/2017 5/1/2016
104 inventory Dave draft 2017-1 1/1/2017 1/1/2017
104 Dave Carl transfer 2017-2 2/1/2017 1/1/2017
104 Carl customer sent 2017-4 4/1/2017 1/1/2017
104 inventory Dave draft 2017-1 1/1/2017 2/1/2017
104 Dave Carl transfer 2017-2 2/1/2017 2/1/2017
104 Carl customer sent 2017-4 4/1/2017 2/1/2017
104 inventory Dave draft 2017-1 1/1/2017 3/1/2017
104 Dave Carl transfer 2017-2 2/1/2017 3/1/2017
104 Carl customer sent 2017-4 4/1/2017 3/1/2017
104 inventory Dave draft 2017-1 1/1/2017 4/1/2017
104 Dave Carl transfer 2017-2 2/1/2017 4/1/2017
104 Carl customer sent 2017-4 4/1/2017 4/1/2017
104 inventory Dave draft 2017-1 1/1/2017 5/1/2017
104 Dave Carl transfer 2017-2 2/1/2017 5/1/2017
104 Carl customer sent 2017-4 4/1/2017 5/1/2017', header=TRUE, stringsAsFactors = FALSE)
check[c('originFrame','currentFrame')] <- lapply(check[c('originFrame','currentFrame')], as.Date, format = '%m/%d/%Y')
我希望按照currentFrame和材质分组,对于其originFrame等于currentFrame的行,如果它不相等,则选择小于currentFrame的最大originFrame,如下所示:
material previousUser currentUser status date originFrame currentFrame
123 inventory Dave draft 2016-1 1/1/2016 1/1/2016
123 Dave Carl transfer 2016-2 2/1/2016 2/1/2016
123 Dave Carl transfer 2016-2 2/1/2016 3/1/2016
123 Carl customer sent 2016-4 4/1/2016 4/1/2016
123 Carl customer sent 2016-4 4/1/2016 5/1/2016
123 inventory Dave draft 2016-1 4/1/2016 1/1/2017
123 Dave Carl transfer 2016-2 4/1/2016 2/1/2017
123 Dave Carl transfer 2016-2 4/1/2016 3/1/2017
123 Carl customer sent 2016-4 4/1/2016 4/1/2017
123 Carl customer sent 2016-4 4/1/2016 5/1/2017
104 inventory Dave draft 2016-1 1/1/2017 1/1/2016
104 Dave Carl transfer 2016-2 1/1/2017 2/1/2016
104 Dave Carl transfer 2016-2 1/1/2017 3/1/2016
104 Carl customer sent 2016-4 1/1/2017 4/1/2016
104 Carl customer sent 2016-4 1/1/2017 5/1/2016
104 inventory Dave draft 2016-1 1/1/2017 1/1/2017
104 Dave Carl transfer 2016-2 2/1/2017 2/1/2017
104 Dave Carl transfer 2016-2 2/1/2017 3/1/2017
104 Carl customer sent 2016-4 4/1/2017 4/1/2017
104 Carl customer sent 2016-4 4/1/2017 5/1/2017
这有效,但没有考虑currentFrame的值,从而给我错误的结果
check <- as.data.frame(
check %>%
group_by(currentFrame, material) %>%
filter(
ifelse(
currentFrame %in% originFrame,
originFrame == currentFrame,
ifelse(
max(originFrame) > currentFame,
originFrame == max(originFrame),
originFrame == max(originFrame)
)
)
)
)
但我似乎无法使用以下规则,即max必须低于currentFrame的值,使用以下内容返回错误的观察次数
check <- as.data.frame(
check %>%
group_by(currentFrame, material) %>%
filter(
ifelse(
currentFrame %in% originFrame,
originFrame == currentFrame,
ifelse(
max(originFrame) > currentFrame,
originFrame == which.max(originFrame < currentFrame),
originFrame == max(originFrame)
)
)
)
)
编辑*应该提及,实际上datafame包含许多具有不同日期的材料,现在更新
Edit2 *好吧,对不起,希望更清楚,如果有人对我如何能更好地表达这个问题有任何反馈,我会很感激。
答案 0 :(得分:1)
您的数据采用更易消费的格式:
check <- read.table(text='material previousUser currentUser status date originFrame currentFrame
123 inventory Dave draft 2016-1 1/1/2016 1/1/2016
123 Dave Carl transfer 2016-2 2/1/2016 1/1/2016
123 Carl customer sent 2016-4 4/1/2016 1/1/2016
123 inventory Dave draft 2016-1 1/1/2016 2/1/2016
123 Dave Carl transfer 2016-2 2/1/2016 2/1/2016
123 Carl customer sent 2016-4 4/1/2016 2/1/2016
123 inventory Dave draft 2016-1 1/1/2016 3/1/2016
123 Dave Carl transfer 2016-2 2/1/2016 3/1/2016
123 Carl customer sent 2016-4 4/1/2016 3/1/2016
123 inventory Dave draft 2016-1 1/1/2016 4/1/2016
123 Dave Carl transfer 2016-2 2/1/2016 4/1/2016
123 Carl customer sent 2016-4 4/1/2016 4/1/2016
123 inventory Dave draft 2016-1 1/1/2016 5/1/2016
123 Dave Carl transfer 2016-2 2/1/2016 5/1/2016
123 Carl customer sent 2016-4 4/1/2016 5/1/2016', header=TRUE, stringsAsFactors = FALSE)
check[c('originFrame','currentFrame')] <- lapply(check[c('originFrame','currentFrame')], as.Date, format = '%m/%d/%Y')
一种方法,继续dplyr
。
library(dplyr)
check %>%
mutate(datediff = currentFrame - originFrame) %>%
arrange(currentFrame, datediff) %>%
group_by(currentFrame) %>%
filter(datediff >= 0) %>%
slice(1) %>%
ungroup() %>%
select(-datediff)
# # A tibble: 5 × 7
# material previousUser currentUser status date originFrame currentFrame
# <int> <chr> <chr> <chr> <chr> <date> <date>
# 1 123 inventory Dave draft 2016-1 2016-01-01 2016-01-01
# 2 123 Dave Carl transfer 2016-2 2016-02-01 2016-02-01
# 3 123 Dave Carl transfer 2016-2 2016-02-01 2016-03-01
# 4 123 Carl customer sent 2016-4 2016-04-01 2016-04-01
# 5 123 Carl customer sent 2016-4 2016-04-01 2016-05-01
答案 1 :(得分:-1)
我明白了,
我最终做的是将数据框分成三个数据框,一个用于originFrame = CurrentFrame,originFrame&lt; currentFrame,originFrame&gt;设置currentFrame。然后,我从数据帧2中移除了数据帧1的所有内容,以及数据帧3中数据帧1和2中的所有内容,然后我从dataframe2获取了max originFrame,从dataframe3中获取了min originFrame。将它们绑定在一起后,我得到了我需要的东西。