我有这个数据集:
a <- data.frame("session_id" = c(rep(1,10), rep(2,7), rep(3,2)),
"content" = c("A", "B", "C","open", "A", "J", "M", "K","exit", "D",
"open", "U", "T","quit", "I", "M" , "A", "Q", "M" ),
"type" = c("non-edit", "non-edit", "non-edit", "edit", "edit", "edit",
"edit", "edit", "edit", "non-edit", "edit", "edit", "edit",
"edit", "non-edit", "non-edit", "non-edit", "non-edit", "non-edit"))
我希望根据内容列将类型列分配给“非编辑”或“编辑”类型。当我们在内容中检测到“打开”直到“退出”或“退出”时,类型将为“编辑”。您可以在我提供的示例中看到该示例。
答案 0 :(得分:3)
我们创建一个新列(new_type
)并将值初始化为“非编辑”。然后,我们找到出现“打开”和“退出”的索引,并使用mapply
在它们之间创建一个索引序列,并将相应的值替换为“编辑”
a$new_type <- "non-edit"
open_ind <- which(a$content == "open")
close_ind <- which(a$content %in% c("quit", "exit"))
a$new_type[unlist(mapply(":", open_ind, close_ind))] <- "edit"
a
# session_id content type new_type
#1 1 A non-edit non-edit
#2 1 B non-edit non-edit
#3 1 C non-edit non-edit
#4 1 open edit edit
#5 1 A edit edit
#6 1 J edit edit
#7 1 M edit edit
#8 1 K edit edit
#9 1 exit edit edit
#10 1 D non-edit non-edit
#11 2 open edit edit
#12 2 U edit edit
#13 2 T edit edit
#14 2 quit edit edit
#15 2 I non-edit non-edit
#16 2 M non-edit non-edit
#17 2 A non-edit non-edit
#18 3 Q non-edit non-edit
#19 3 M non-edit non-edit
要了解这些步骤,
open_ind
#[1] 4 11
close_ind
#[1] 9 14
unlist(mapply(":", open_ind, close_ind))
#[1] 4 5 6 7 8 9 11 12 13 14
答案 1 :(得分:2)
按“ session_id”分组后,通过取逻辑表达式的累加和来创建另一个组,并将其用于关联值“ edit”和“ non-edit”
library(dplyr)
a %>%
group_by(session_id) %>%
group_by(grp = cumsum((content == "open")|
lag(content %in% c("exit", "quit"),
default = first(content))), add = TRUE) %>%
mutate(type1 = case_when(any(content %in% c("open", "exit", "quit")) ~ "edit",
TRUE ~ "non-edit")) %>%
ungroup %>%
select(-grp)
# A tibble: 19 x 4
# session_id content type type1
# <dbl> <fct> <fct> <chr>
# 1 1 A non-edit non-edit
# 2 1 B non-edit non-edit
# 3 1 C non-edit non-edit
# 4 1 open edit edit
# 5 1 A edit edit
# 6 1 J edit edit
# 7 1 M edit edit
# 8 1 K edit edit
# 9 1 exit edit edit
#10 1 D non-edit non-edit
#11 2 open edit edit
#12 2 U edit edit
#13 2 T edit edit
#14 2 quit edit edit
#15 2 I non-edit non-edit
#16 2 M non-edit non-edit
#17 2 A non-edit non-edit
#18 3 Q non-edit non-edit
#19 3 M non-edit non-edit
答案 2 :(得分:1)
这里是不需要分组的管道。
library(dplyr)
library(tidyr)
b <-
a %>%
# 1. Mark the boundaries of the 'edit' regions.
mutate(type = case_when(content == "open" ~ "edit",
grepl("exit|quit", content) ~ "non-edit",
TRUE ~ NA_character_)) %>%
# 2. Fill the NAs with the last good value. 'open' down to 'exit/quit'
# will be filled with 'edit'.
tidyr::fill(type) %>%
# 3. Replace unfilled NAs, like at the top of the table.
replace_na(list(type = "non-edit")) %>%
# 4. Rename the exit/quit boundary.
mutate(type = ifelse(grepl("exit|quit", content), "edit", type))
b
#> session_id content type
#> 1 1 A non-edit
#> 2 1 B non-edit
#> 3 1 C non-edit
#> 4 1 open edit
#> 5 1 A edit
#> 6 1 J edit
#> 7 1 M edit
#> 8 1 K edit
#> 9 1 exit edit
#> 10 1 D non-edit
#> 11 2 open edit
#> 12 2 U edit
#> 13 2 T edit
#> 14 2 quit edit
#> 15 2 I non-edit
#> 16 2 M non-edit
#> 17 2 A non-edit
#> 18 3 Q non-edit
#> 19 3 M non-edit
答案 3 :(得分:0)
计划:在内容列中逐步查找“转换键”。如果key为“ open”,则立即执行操作;如果key为“ quit”或“ exit”,则执行下一行。 考虑以下代码来实现:
last <- 'exit' #initialize last
keys <- c('open','exit','quit') #transition keys
for (i in 1:nrow(a)) {
a$type[i] <- ifelse(a$content[i] %in% keys, 'edit',
ifelse(last=='open','edit','non-edit'))
last <- ifelse(a$content[i]%in% keys, a$content[i],last)
}
a
R> a
session_id content type
1 1 A non-edit
2 1 B non-edit
3 1 C non-edit
4 1 open edit
5 1 A edit
6 1 J edit
7 1 M edit
8 1 K edit
9 1 exit edit
10 1 D non-edit
11 2 open edit
12 2 U edit
13 2 T edit
14 2 quit edit
15 2 I non-edit
16 2 M non-edit
17 2 A non-edit
18 3 Q non-edit
19 3 M non-edit
答案 4 :(得分:0)
这是使用cumsum
在基数R中的一种方法:
a$new_type <- c("non-edit","edit")[
cumsum(a$content=="open") - c(0,head(cumsum(a$content %in% c("exit","quit")),-1)) +1]
# session_id content type new_type
# 1 1 A non-edit non-edit
# 2 1 B non-edit non-edit
# 3 1 C non-edit non-edit
# 4 1 open edit edit
# 5 1 A edit edit
# 6 1 J edit edit
# 7 1 M edit edit
# 8 1 K edit edit
# 9 1 exit edit edit
# 10 1 D non-edit non-edit
# 11 2 open edit edit
# 12 2 U edit edit
# 13 2 T edit edit
# 14 2 quit edit edit
# 15 2 I non-edit non-edit
# 16 2 M non-edit non-edit
# 17 2 A non-edit non-edit
# 18 3 Q non-edit non-edit
# 19 3 M non-edit non-edit