通过合并变量中的行来重新组织数据库

时间:2016-07-15 15:46:23

标签: r database row reorganize

我有一个如下所示的数据库:

userId          SessionId        Screen         Platform       Version
01              1                first          IOS            1.0.1
01              1                main           IOS            1.0.1
01              2                first          IOS            1.0.1
01              3                first          IOS            1.0.1
01              3                main           IOS            1.0.1
01              3                detail         IOS            1.0.1
02              1                first          Android        1.0.2

基本上我打算做的是确定一条"路径" (不同的屏幕)导致更好的保留或不。我想在一列中重新组织每个sessionId。理想的数据库看起来像这样:

userId       SessionId       Path                 Retention
01           1               first;main           3
01           2               first                3
01           3               first;main;detail    3
02           1               first                1

以下变量Retention将等于最大SessionId

2 个答案:

答案 0 :(得分:1)

基础R中的可能解决方案:

d2 <- aggregate(Screen ~ userId + SessionId, d, toString)
transform(d2, retention = ave(Screen, userId, FUN = length))

给出:

> d2
  userId SessionId              Screen retention
1     01         1         first, main         3
2     02         1               first         1
3     01         2               first         3
4     01         3 first, main, detail         3

使用dplyr的替代方案:

library(dplyr)
d %>% 
  group_by(userId, SessionId) %>% 
  summarise(Screen = toString(Screen)) %>% 
  group_by(userId) %>% 
  mutate(retention = n())

给出:

  userId SessionId              Screen retention
   <chr>     <int>               <chr>     <int>
1     01         1         first, main         3
2     01         2               first         3
3     01         3 first, main, detail         3
4     02         1               first         1

答案 1 :(得分:0)

我有一个data.table解决方案

library(data.table)
dt <- as.data.table(d)
dt[, Retention := max(SessionId), by = .(userId)]
dt[, .(Screen = paste(Screen, collapse = ";"), Retention = unique(Retention)), by = .(userId, SessionId)]

userId SessionId            Screen Retention
1:     01         1        first;main         3
2:     01         2             first         3
3:     01         3 first;main;detail         3
4:     02         1             first         1