我需要对“ session_start”行之间的列journey
中的步骤进行编号。我想不出如何为这种情况编写循环。
df <- data.table(
page = c("page_1", "page_2", "page_3", "page_1", "page_2", "page_1", "page_2", "page_3"),
journey = c("session_start", NA, NA, "session_start", NA, "session_start", NA, NA)
)
所需的结果应该是这样。
df <- data.table(
page = c("page_1", "page_2", "page_3", "page_1", "page_2", "page_1", "page_2", "page_3"),
journey = c("session_start", "step_1", "step_2", "session_start", "step_1", "session_start",
"step_1", "step_2")
)
答案 0 :(得分:2)
这就是您想要的。请确保将stringsAsFactors = F
添加到data.table
,否则journey
列将无法正确处理。
for (i in 1:nrow(df)) {
if (is.na(df$journey[i])) {
df$journey[i] <- paste('step',step_index,sep='')
step_index <- step_index + 1
} else {
step_index <- 1
}
}
答案 1 :(得分:2)
您可以尝试以下方法:
df$journey <- ifelse(df$page == "page_1","session_start", gsub(".*_","step_",df$page))
哪个给:
> df
page journey
1: page_1 session_start
2: page_2 step_2
3: page_3 step_3
4: page_1 session_start
5: page_2 step_2
6: page_1 session_start
7: page_2 step_2
8: page_3 step_3
答案 2 :(得分:1)
尝试使用ave
的此解决方案。
i <- df$journey == "session_start"
i[is.na(i)] <- 0L
f <- cumsum(i)
df$journey <- ave(as.character(df$journey), f, FUN = function(s){
s[is.na(s)] <- paste0("step_", seq_along(s)[-length(s)])
s
})
使用data.table
软件包可能有更好的方法,但是我不太熟练。