我的代码
df <- df %>% group_by(user_id, groupInsideUID = cumsum(time)) %>%
mutate(Rank = ifelse(row_number() == 1, 'New','Repeat'))
我的数据框示例如下:
id user_id groupInsideUID time Rank
30 11 0 NA New
31 11 1 2/1/19 7:35 PM New
54 5 1 3/1/18 2:35 PM New
322 5 2 7/3/18 2:50 PM New
21 5 2 NA Repeat
13 5 3 8/3/18 2:50 PM New
2445 2 0 NA New
111 2 0 NA Repeat
287 2 1 5/3/18 2:50 PM New
221 14 0 NA New
2345 7 0 NA New
我想根据最短时间在每个user_id组中创建一个新的列Rank(新建和重复)。在每个user_id组中,任何id在最短时间之前应有一个时间,该时间应为New(包括NA),在最短时间之后将被重复(包括NA)。
结果应如下所示:
id user_id time Rank
30 11 NA New
31 11 2/1/19 7:35 PM New
54 5 3/1/18 2:35 PM New
322 5 7/3/18 2:50 PM Repeat
21 5 NA Repeat
13 5 8/3/18 2:50 PM Repeat
2445 2 NA New
111 2 NA New
287 2 5/3/18 2:50 PM New
221 14 NA New
2345 7 NA New
感谢任何提示!
答案 0 :(得分:0)
一种选择是将“时间”转换为按“ user_id”分组的DateTime类,根据最小“时间”的位置创建“等级”
library(lubridate)
library(dplyr)
df %>%
mutate(time = dmy_hm(time)) %>%
group_by(user_id) %>%
mutate(Rank = if(all(is.na(time))) "New"
else case_when(row_number() <= which.min(time) ~ "New",
TRUE ~ "Repeat"))
# A tibble: 11 x 5
# Groups: user_id [5]
# id user_id groupInsideUID time Rank
# <int> <int> <int> <dttm> <chr>
# 1 30 11 0 NA New
# 2 31 11 1 2019-01-02 19:35:00 New
# 3 54 5 1 2018-01-03 14:35:00 New
# 4 322 5 2 2018-03-07 14:50:00 Repeat
# 5 21 5 2 NA Repeat
# 6 13 5 3 2018-03-08 14:50:00 Repeat
# 7 2445 2 0 NA New
# 8 111 2 0 NA New
# 9 287 2 1 2018-03-05 14:50:00 New
#10 221 14 0 NA New
#11 2345 7 0 NA New
df <- structure(list(id = c(30L, 31L, 54L, 322L, 21L, 13L, 2445L, 111L,
287L, 221L, 2345L), user_id = c(11L, 11L, 5L, 5L, 5L, 5L, 2L,
2L, 2L, 14L, 7L), groupInsideUID = c(0L, 1L, 1L, 2L, 2L, 3L,
0L, 0L, 1L, 0L, 0L), time = c(NA, "2/1/19 7:35 PM", "3/1/18 2:35 PM",
"7/3/18 2:50 PM", NA, "8/3/18 2:50 PM", NA, NA, "5/3/18 2:50 PM",
NA, NA), Rank = c("New", "New", "New", "New", "Repeat", "New",
"New", "Repeat", "New", "New", "New")), class = "data.frame",
row.names = c(NA,
-11L))