我正在使用data.table尝试以下聪明的归因:
for (s in c('G1', 'G2') {
t[t[ , .I[seq(which.max(get(s)), .N)], by = GROUP]$V1, get(s) := 1]
}
出于某种原因,它抱怨发出错误:
Error in get(s) : object 'G1' not found
但是,显式表单效果很好:
t[t[ , .I[seq(which.max(G1), .N)], by = GROUP]$V1, G1 := 1]
当然,我的真实数据集有很多列,事先名称未知。 我在这里做错了什么?
这是一个示例数据集:
G1 G2 GROUP
0.081975988 0.281210522 A
0.726230621 0.91873287 A
0.938997082 0.146669516 A
0.10564305 0.219593442 A
0.112977071 0.451366779 A
0.157260728 0.570366021 A
0.586841571 0.742955139 B
0.418178989 0.584326765 B
0.290443749 0.435277405 B
0.682695255 0.138739152 B
0.992847073 0.198544311 B
0.401170904 0.347155973 B
0.591182359 0.219964292 C
0.003935376 0.231136145 C
0.666710774 0.479126371 C
0.791187106 0.153873696 C
0.921437692 0.31429481 C
0.88193519 0.801150898 C
答案 0 :(得分:4)
问题在于赋值部分,您使用s
作为符号,其值是新列名,而get
则返回列对象,它本质上是一个向量;您需要的是(s) := 1
,另请参阅Select / assign to data.table variables which names are stored in a character vector
。
for (s in c('G1', 'G2')) {
t[t[, .I[seq(which.max(get(s)), .N)], by = GROUP]$V1, (s) := 1][]
}
t
# G1 G2 GROUP
# 1: 0.081975988 0.2812105 A
# 2: 0.726230621 1.0000000 A
# 3: 1.000000000 1.0000000 A
# 4: 1.000000000 1.0000000 A
# 5: 1.000000000 1.0000000 A
# 6: 1.000000000 1.0000000 A
# 7: 0.586841571 1.0000000 B
# 8: 0.418178989 1.0000000 B
# 9: 0.290443749 1.0000000 B
#10: 0.682695255 1.0000000 B
#11: 1.000000000 1.0000000 B
#12: 1.000000000 1.0000000 B
#13: 0.591182359 0.2199643 C
#14: 0.003935376 0.2311361 C
#15: 0.666710774 0.4791264 C
#16: 0.791187106 0.1538737 C
#17: 1.000000000 0.3142948 C
#18: 1.000000000 1.0000000 C