我有两个数据框(db1
和db2
),我想在db2
中获得与db1
中某些参数匹配的位置。这可以使用for
循环实现,如下所示:
db1 <- data.frame(id=rep(1:4,each=4),
class=sample(1:10, 16, replace=TRUE),
var=rnorm(16)
)
db2 <- expand.grid(id=1:4, class=1:10)
db2$x <- rnorm(nrow(db2))
for(i in 1:nrow(db1)) print(which(db2$id==db1$id[i] & db2$class==db1$class[i]))
然而,循环是非常低效的,所以我想矢量化这个循环。可以将向量传递给which()
函数,以便函数在db2中搜索db1中的每个值吗?
答案 0 :(得分:4)
library(data.table)
db1 <- data.table(db1)
db2 <- data.table(db2)
# You can index by additional columns as necessary
setkeyv(db1, c("id","class"))
setkeyv(db2, c("id","class"))
# Show only records in db2 that match id and class with db1
db2[db1,]
id class x var
[1,] 1 1 -0.50266835 0.82391749
[2,] 1 9 -1.21245991 -1.43163848
[3,] 1 9 -1.21245991 -0.68622189
[4,] 1 10 -0.28659235 -0.98107793
[5,] 2 4 2.18779836 1.25841256
[6,] 2 6 1.32407301 0.42287395
[7,] 2 7 -0.53808409 -0.12069089
[8,] 2 10 -0.67679146 -0.73930821
[9,] 3 7 0.03133591 0.31142901
[10,] 3 8 0.78927215 1.86952233
[11,] 3 9 -0.04674115 -0.45102021
[12,] 3 10 -0.83388764 -0.04354332
[13,] 4 8 1.17608109 -0.07343352
[14,] 4 8 1.17608109 -0.00053299
[15,] 4 9 0.59344187 -0.21407897
[16,] 4 10 -2.06237055 0.78420146
# To just return an index of matching rows
db2[db1, which=T]
[1] 1 9 9 10 14 16 17 20 27 28 29 30 38 38 39 40
# To get only unique row indices
> db2[unique(db1),which=T]
[1] 1 9 10 14 16 17 20 27 28 29 30 38 39 40
答案 1 :(得分:0)
如果db1和db2具有相同的行数,则打印db2和db1'id,class'相等的所有db2行:
print(db2[db2$id == db1$id & db2$class == db1$class,])
按db2 $ id排序的相同查询:
print(db2[order(db2[db2$id == db1$id & db2$class == db1$class,]$id, decreasing = TRUE))