我想在数据表中添加一个列,该数据表是从其他列计算值并且尚未找到方法来执行此操作。
DT <- data.table("userId" = c("user1", "user1", "user1", "user1", "user2", "user2", "user2", "user2", "user2"),
"topicId" = "topic1",
"attempt" = c(1, 2, 3, 4, 1, 2, 3, 4, 5),
"score" = c(50, 0, 50, 20, 20, 100, 100, 100, 100))
userId topicId attempt score
1: user1 topic1 1 50
2: user1 topic1 2 0
3: user1 topic1 3 50
4: user1 topic1 4 20
5: user2 topic1 1 20
6: user2 topic1 2 100
7: user2 topic1 3 100
8: user2 topic1 4 100
9: user2 topic1 5 100
最终,我想添加以下两列,以显示每个用户/主题组在首次尝试和之前尝试时的得分差异:
userId topicId attempt score scoreDiffFromFirst scoreDiffFromPrev
1: user1 topic1 1 50 NA NA
2: user1 topic1 2 0 -50 -50
3: user1 topic1 3 50 0 50
4: user1 topic1 4 20 -30 -30
5: user2 topic1 1 20 NA NA
6: user2 topic1 2 100 80 80
7: user2 topic1 3 100 80 0
8: user2 topic1 4 100 80 0
9: user2 topic1 5 100 80 0
我尝试了以下内容:
for(i in 2:max(DT$attempt)) {
DT[attempt == i, scoreDiffFromFirst := score - DT[attempt == 1 & userId == userId & topicId == topicId, score]]
DT[attempt == i, scoreDiffFromPrev := score - DT[attempt == i - 1 & userId == userId & topicId == topicId, score]]
}
直到它遇到未在每个用户/主题组中表示的尝试号码。这意味着最后一行不正确,如下所示,因为user1没有第五次尝试:
userId topicId attempt score scoreDiffFromFirst scoreDiffFromPrev
1: user1 topic1 1 50 NA NA
2: user1 topic1 2 0 -50 -50
3: user1 topic1 3 50 0 50
4: user1 topic1 4 20 -30 -30
5: user2 topic1 1 20 NA NA
6: user2 topic1 2 100 80 80
7: user2 topic1 3 100 80 0
8: user2 topic1 4 100 80 0
9: user2 topic1 5 100 50** 80**
很高兴从任何建议中学习,但理想情况下,我喜欢上述的一些变化,因为速度的好处是极端的。
提前致谢。
答案 0 :(得分:4)
您可以使用by=
DT[order(attempt), c("scoreDiffFromFirst", "scoreDiffFromPrev") :=
.(replace(score, 1, NA) - first(score), score - shift(score))
, by=.(userId,topicId)]
# or
DT[order(attempt), `:=`(
scoreDiffFromFirst = replace(score, 1, NA) - first(score),
scoreDiffFromPrev = score - shift(score)
), by=.(userId,topicId)]
给出了
userId topicId attempt score scoreDiffFromFirst scoreDiffFromPrev
1: user1 topic1 1 50 NA NA
2: user1 topic1 2 0 -50 -50
3: user1 topic1 3 50 0 50
4: user1 topic1 4 20 -30 -30
5: user2 topic1 1 20 NA NA
6: user2 topic1 2 100 80 80
7: user2 topic1 3 100 80 0
8: user2 topic1 4 100 80 0
9: user2 topic1 5 100 80 0
对于语法背后的直觉,您可以在加载包时查看启动消息中提到的材料:
library(data.table)
data.table 1.10.4.3
最快的学习方式(由data.table作者提供):https://www.datacamp.com/courses/data-analysis-the-data-table-way
文档:
?data.table
,example(data.table)
和browseVignettes("data.table")
发布说明,视频和幻灯片:http://r-datatable.com
答案 1 :(得分:2)
dplyr
让这很容易。 first
和lag
可以轻松获取相对位置的得分值,而group_by
按userId
和topicId
对行进行分组,然后按组应用分析,不是在桌子上。
DT <- DT %>%
group_by(userId, topicId) %>%
mutate(scoreDiffFromFirst = score - first(score),
scoreDIffFromPrev = score - lag(score))
DT
# A tibble: 9 x 6
# Groups: userId, topicId [2]
userId topicId attempt score scoreDiffFromFirst scoreDIffFromPrev
<fct> <fct> <dbl> <dbl> <dbl> <dbl>
1 user1 topic1 1. 50. 0. NA
2 user1 topic1 2. 0. -50. -50.
3 user1 topic1 3. 50. 0. 50.
4 user1 topic1 4. 20. -30. -30.
5 user2 topic1 1. 20. 0. NA
6 user2 topic1 2. 100. 80. 80.
7 user2 topic1 3. 100. 80. 0.
8 user2 topic1 4. 100. 80. 0.
9 user2 topic1 5. 100. 80. 0.
唯一的区别是,scoreDiffFromFirst
是0
,而不是NA
首次尝试行。如果那不是你想要的,你可以直接替换它们:
DT[DT$attempt == 1, "scoreDiffFromFirst"] <- NA
答案 2 :(得分:0)
您还可以使用ifelse
方法。
DT[, `:=`(
scoreDiffFromFirst = ifelse(rowid(topicId) == 1, NA, score - first(score)),
scoreDiffFromPrev = c(NA, diff(score))), by = userId]