我有一个像这样的数据框(df1t):
userid | interaction | goal
0001 | Access1 | 0
0001 | Access2 | 1
0001 | Access3 | 0
0002 | Access1 | 1
0003 | Access2 | 0
对于我正在使用的数据分组:
library(dplyr)
usrlvl <- df1t %>%
group_by(userid) %>%
summarise(path = paste(interaction, collapse = " > "),
goal = sum(goal)
)
结果是这样的
userid | path | goal
0001 | Access1 > Access2 > Access3 | 1
0002 | Access1 | 1
0003 | Access2 | 0
但是我对这个结果有些疑问。我的路径应该停在目标中并忽略其他人的互动。结果必须如下:
userid | path | goal
0001 | Access1 > Access2 | 1
0002 | Access1 | 1
0003 | Access2 | 0
有人有这样的问题吗?
答案 0 :(得分:1)
使用dplyr
usrlvl <- df1t %>%
group_by(userid) %>%
filter(!(goal==0 & cumsum(goal)==1))%>%
summarise(path = paste(interaction, collapse = " > "),
goal = sum(goal))
# A tibble: 3 x 3
userid path goal
<dbl> <chr> <dbl>
1 1 access1 > access2 1
2 2 access1 1
3 3 access2 0
答案 1 :(得分:1)
您可以使用which.max
在目标从1变为零之前停在最后一个目标
usrlvl <- df1t %>%
group_by(userid) %>%
summarise(path = paste(interaction[1:which.max(goal)], collapse = " > "),
goal = sum(goal)
)
#A tibble: 3 × 3
# userid path goal
# <int> <chr> <int>
#1 1 Access1 > Access2 1
#2 2 Access1 1
#3 3 Access2 0
答案 2 :(得分:1)
一种选择是编写一个捕获目标状态的函数,并写出该状态的路径。使代码更清晰,特别是如果您需要经常这样做(或使用不同类型的标准)。
首先,定义函数:
untilGoal <- function(x, goal){
if(sum(goal) >= 1){
paste(x[1:(which(goal)[1])], collapse = " > ")
} else
paste(x, collapse = " > ")
}
期望两个向量,一个用于粘贴,另一个用于逻辑(这是允许灵活性的)。然后,在summarise
电话中使用该功能:
df %>%
group_by(userid) %>%
summarise(path = untilGoal(interaction, goal == 1)
, goal = sum(goal))
给出:
userid path goal
1 1 Access1 > Access2 1
2 2 Access1 1
3 3 Access2 0