adist:不同的Levenshtein对齐取决于字符串的输入方式

时间:2015-06-02 07:38:50

标签: r similarity

当使用R中的adist函数来计算字符串对之间的Levenshtein对齐时,我会得到不同的结果,这取决于我是为每对运行一次函数还是使用向量一次输入几对。那是为什么?

示例: 字符串对的转换' kniepen' - ' kneifen',' grijpen' greifen'和' lopen' laufen':

attr(adist("knijpen", "kneifen", counts = TRUE), "trafos")
#      [,1]      
# [1,] "MMIMSDMM"

attr(adist("grijpen", "greifen", counts = TRUE), "trafos")
#      [,1]      
# [1,] "MMIMSDMM"

attr(adist("lopen", "laufen", counts = TRUE), "trafos")
#      [,1]    
# [1,] "MSSIMM"

这些与我自己的手动解决方案一致。 但是,当我使用向量输入字符串时,我得到的结果略有不同:

dutch <- c("knijpen", "grijpen", "lopen")
german <- c("kneifen", "greifen", "laufen")
attr(adist(dutch, german, counts = TRUE), "trafos")
#      [,1]       [,2]       [,3]      
# [1,] "MMIMSDMM" "SSIMSDMM" "SSSSDMMM"
# [2,] "SSIMSDMM" "MMIMSDMM" "SSSSDMMM"
# [3,] "SSSIIMMM" "SSSIIMMM" "MSSIMMM" 

此矩阵中的[3,3]元素应与attr(adist("lopen", "laufen", counts = TRUE), "trafos")(即"MSSIMM")对应,但它上面还有另一个M。为什么呢?

0 个答案:

没有答案