我是一个看起来像这样的数据库:
userId Screen Platform Version 01 first IOS 1.0.1 01 main IOS 1.0.1 02 first Android 1.0.2 03 first IOS 1.0.2 03 main IOS 1.0.2 03 detail IOS 1.0.2
基本上我想知道有多少人"掉线"在第一个屏幕之后,我的想法是创建一个新列,它告诉用户通过userId访问的屏幕数量, 理想的数据库看起来像这样:
userId DifferentScreen Platform Version 01 2 IOS 1.0.1 02 1 Android 1.0.2 03 3 IOS 1.0.2
我试过了:
setDT(database)[order(userId) ,. (DifferentScreen = uniqueN(Screen), Version = Version[1L], Platform = Platform[1L], by = userId)]
但它不起作用,我发现的问题是:它没有按用户列分组,因为列数保持不变,我使用命令uniqueN因为我还没找到命令只做.N()。
答案 0 :(得分:0)
你基本上就在那里。关于缺少括号只有一个小问题。尝试:
dt[order(userId) ,. (DifferentScreen = uniqueN(Screen), Version = Version[1L], Platform = Platform[1L]), by = userId]
userId DifferentScreen Version Platform
1: 1 2 1.0.1 IOS
2: 2 1 1.0.2 Android
3: 3 3 1.0.2 IOS
您需要在by = userID
之前关闭括号。这种方式data.table
将by=...
读取为分组而不是名为by
的新变量。目前,您的输出数据集没有按任何分组,它认为您要创建一个名为by
的变量。
您可以在旧代码的结果中看到这一点:
dt[order(userId) ,. (DifferentScreen = uniqueN(Screen), Version = Version[1L], Platform = Platform[1L], by = userId)]
#See how this creates a variable "by"?
DifferentScreen Version Platform by
1: 3 1.0.1 IOS 1
2: 3 1.0.1 IOS 1
3: 3 1.0.1 IOS 2
4: 3 1.0.1 IOS 3
5: 3 1.0.1 IOS 3
6: 3 1.0.1 IOS 3
数据强>:
dt <- structure(list(userId = c(1L, 1L, 2L, 3L, 3L, 3L), Screen = structure(c(2L,
3L, 2L, 2L, 3L, 1L), .Label = c("detail", "first", "main"), class = "factor"),
Platform = structure(c(2L, 2L, 1L, 2L, 2L, 2L), .Label = c("Android",
"IOS"), class = "factor"), Version = structure(c(1L, 1L,
2L, 2L, 2L, 2L), .Label = c("1.0.1", "1.0.2"), class = "factor")), .Names = c("userId",
"Screen", "Platform", "Version"), class = c("data.table", "data.frame"
), row.names = c(NA, -6L), .internal.selfref = <pointer: 0x0000000000250788>)