当使用R中的adist
函数来计算字符串对之间的Levenshtein对齐时,我会得到不同的结果,这取决于我是为每对运行一次函数还是使用向量一次输入几对。那是为什么?
示例: 字符串对的转换' kniepen' - ' kneifen',' grijpen' greifen'和' lopen' laufen':
attr(adist("knijpen", "kneifen", counts = TRUE), "trafos")
# [,1]
# [1,] "MMIMSDMM"
attr(adist("grijpen", "greifen", counts = TRUE), "trafos")
# [,1]
# [1,] "MMIMSDMM"
attr(adist("lopen", "laufen", counts = TRUE), "trafos")
# [,1]
# [1,] "MSSIMM"
这些与我自己的手动解决方案一致。 但是,当我使用向量输入字符串时,我得到的结果略有不同:
dutch <- c("knijpen", "grijpen", "lopen")
german <- c("kneifen", "greifen", "laufen")
attr(adist(dutch, german, counts = TRUE), "trafos")
# [,1] [,2] [,3]
# [1,] "MMIMSDMM" "SSIMSDMM" "SSSSDMMM"
# [2,] "SSIMSDMM" "MMIMSDMM" "SSSSDMMM"
# [3,] "SSSIIMMM" "SSSIIMMM" "MSSIMMM"
此矩阵中的[3,3]元素应与attr(adist("lopen", "laufen", counts = TRUE), "trafos")
(即"MSSIMM"
)对应,但它上面还有另一个M
。为什么呢?