确定发生后用户丢弃的次数

时间:2016-06-22 13:26:07

标签: r

我是一个看起来像这样的数据库:

userId              Screen         Platform       Version
01                  first          IOS            1.0.1
01                  main           IOS            1.0.1
02                  first          Android        1.0.2
03                  first          IOS            1.0.2
03                  main           IOS            1.0.2
03                  detail         IOS            1.0.2

基本上我想知道有多少人"掉线"在第一个屏幕之后,我的想法是创建一个新列,它告诉用户通过userId访问的屏幕数量, 理想的数据库看起来像这样:

userId              DifferentScreen        Platform      Version
01                  2                     IOS            1.0.1
02                  1                     Android        1.0.2
03                  3                     IOS            1.0.2

我试过了:

setDT(database)[order(userId) ,. (DifferentScreen = uniqueN(Screen), Version = Version[1L], Platform = Platform[1L], by = userId)]

但它不起作用,我发现的问题是:它没有按用户列分组,因为列数保持不变,我使用命令uniqueN因为我还没找到命令只做.N()。

1 个答案:

答案 0 :(得分:0)

你基本上就在那里。关于缺少括号只有一个小问题。尝试:

dt[order(userId) ,. (DifferentScreen = uniqueN(Screen), Version = Version[1L], Platform = Platform[1L]), by = userId]

   userId DifferentScreen Version Platform
1:      1               2   1.0.1      IOS
2:      2               1   1.0.2  Android
3:      3               3   1.0.2      IOS

您需要在by = userID之前关闭括号。这种方式data.tableby=...读取为分组而不是名为by的新变量。目前,您的输出数据集没有按任何分组,它认为您要创建一个名为by的变量。

您可以在旧代码的结果中看到这一点:

dt[order(userId) ,. (DifferentScreen = uniqueN(Screen), Version = Version[1L], Platform = Platform[1L], by = userId)]

#See how this creates a variable "by"?
   DifferentScreen Version Platform by
1:               3   1.0.1      IOS  1
2:               3   1.0.1      IOS  1
3:               3   1.0.1      IOS  2
4:               3   1.0.1      IOS  3
5:               3   1.0.1      IOS  3
6:               3   1.0.1      IOS  3

数据

dt <- structure(list(userId = c(1L, 1L, 2L, 3L, 3L, 3L), Screen = structure(c(2L, 
3L, 2L, 2L, 3L, 1L), .Label = c("detail", "first", "main"), class = "factor"), 
    Platform = structure(c(2L, 2L, 1L, 2L, 2L, 2L), .Label = c("Android", 
    "IOS"), class = "factor"), Version = structure(c(1L, 1L, 
    2L, 2L, 2L, 2L), .Label = c("1.0.1", "1.0.2"), class = "factor")), .Names = c("userId", 
"Screen", "Platform", "Version"), class = c("data.table", "data.frame"
), row.names = c(NA, -6L), .internal.selfref = <pointer: 0x0000000000250788>)