Question

我有一个距离矩阵：

> mat
          hydrogen   helium  lithium beryllium    boron
hydrogen  0.000000 2.065564 3.940308  2.647510 2.671674
helium    2.065564 0.000000 2.365661  1.697749 1.319400
lithium   3.940308 2.365661 0.000000  3.188148 2.411567
beryllium 2.647510 1.697749 3.188148  0.000000 2.499369
boron     2.671674 1.319400 2.411567  2.499369 0.000000

数据框：

> results

El1      El2    Score
Helium Hydrogen   92
Boron   Helium    61
Boron  Lithium    88

我想计算results$El1和results$El2中字词之间的所有成对距离，以获取以下内容：

> results

El1      El2    Score   Dist
Helium Hydrogen   92    2.065564
Boron   Helium    61    1.319400
Boron  Lithium    88    2.411567

我用for循环做了这个，但看起来真的很笨重。是否有更优雅的方式来搜索和提取更少的代码行？

这是我目前的代码：

names = row.names(mat) 
num.results <- dim(results)[1]   
El1 =  match(results$El1, names)  
El2 = match(results$El2, names)    
el.dist <- matrix(0, num.results, 1)        
for (i1 in c(1:num.results)) {             
el.dist[i1, 1] <- mat[El1[i1], El2[i1]]
}
results$Dist = el.dist[,1]

Answer 1

cols <- match(tolower(results$El1), colnames(mat))
rows <- match(tolower(results$El2), colnames(mat))
results$Dist <- mat[cbind(rows, cols)]
results
     El1      El2 Score     Dist
1 Helium Hydrogen    92 2.065564
2  Boron   Helium    61 1.319400
3  Boron  Lithium    88 2.411567

你会认识到大部分代码。要关注的是mat[cbind(rows, cols)]。对于矩阵，我们可以通过另一个具有与维度相同的列数的矩阵进行子集化。来自?`[`帮助：

当通过[单个参数索引数组时] i可以是具有与x的维度一样多的列的矩阵;结果是一个向量，其元素对应于i的每一行中的索引集。

Answer 2

另一种方法

results$Dist <- mapply(function(x, y) mat[tolower(x), tolower(y)],
                       results$El1, results$El2)

这假定results使用character而不是factor El1和El2。

结果

> results
     El1      El2 Score     Dist
1 Helium Hydrogen    92 2.065564
2  Boron   Helium    61 1.319400
3  Boron  Lithium    88 2.411567

有效地访问成对距离

2 个答案: