我有一个如下所示的数据库:
userId SessionId Screen Platform Version
01 1 first IOS 1.0.1
01 1 main IOS 1.0.1
01 2 first IOS 1.0.1
01 3 first IOS 1.0.1
01 3 main IOS 1.0.1
01 3 detail IOS 1.0.1
02 1 first Android 1.0.2
基本上我打算做的是确定一条"路径" (不同的屏幕)导致更好的保留或不。我想在一列中重新组织每个sessionId。理想的数据库看起来像这样:
userId SessionId Path Retention
01 1 first;main 3
01 2 first 3
01 3 first;main;detail 3
02 1 first 1
以下变量Retention
将等于最大SessionId
。
答案 0 :(得分:1)
基础R中的可能解决方案:
d2 <- aggregate(Screen ~ userId + SessionId, d, toString)
transform(d2, retention = ave(Screen, userId, FUN = length))
给出:
> d2
userId SessionId Screen retention
1 01 1 first, main 3
2 02 1 first 1
3 01 2 first 3
4 01 3 first, main, detail 3
使用dplyr
的替代方案:
library(dplyr)
d %>%
group_by(userId, SessionId) %>%
summarise(Screen = toString(Screen)) %>%
group_by(userId) %>%
mutate(retention = n())
给出:
userId SessionId Screen retention
<chr> <int> <chr> <int>
1 01 1 first, main 3
2 01 2 first 3
3 01 3 first, main, detail 3
4 02 1 first 1
答案 1 :(得分:0)
我有一个data.table
解决方案
library(data.table)
dt <- as.data.table(d)
dt[, Retention := max(SessionId), by = .(userId)]
dt[, .(Screen = paste(Screen, collapse = ";"), Retention = unique(Retention)), by = .(userId, SessionId)]
userId SessionId Screen Retention
1: 01 1 first;main 3
2: 01 2 first 3
3: 01 3 first;main;detail 3
4: 02 1 first 1