我有两个数据集Orig / Match,我想在名称列上进行不同的匹配。迭代仅适用于第一行,而不是第2行中的打印名称列,但不会继续对Orig数据集进行迭代。似乎两个For循环不是正确的方法。 :(
感谢您的帮助。
object poc {
// similarity methods
def lv_distance(s1: String, s2: String) = {
LevenshteinMetric.compare(s1, s2)
}
def jv_distance(s1: String, s2: String) = {
JaroWinklerMetric.compare(s1, s2)
}
// phonetic methods
def mp_distance(s1: String, s2: String) = {
MetaphoneMetric.compare(s1, s2)
}
def sx_distance(s1: String, s2: String) = {
SoundexMetric.compare(s1, s2)
}
// output definition
def printDistance(s1: String, s2: String) = println("%s -> %s, Levenshtein: %s, JaroWinkler: %s, Soundex: %s, Metaphone: %s"
.format(s1, s2, lv_distance(s1, s2).get, jv_distance(s1, s2).get, sx_distance(s1, s2).get, mp_distance(s1, s2).get))
def main(args: Array[String]): Unit = {
val fileNameOrig = io.Source.fromFile(args(0), "iso-8859-1")
val fileNameMatch = io.Source.fromFile(args(1), "iso-8859-1")
for (lineMatch <- fileNameMatch.getLines()) {
val colsMatch = lineMatch.split(",").map(_.trim)
println(1, s"${colsMatch(0)}")
for (lineOrig <- fileNameOrig.getLines()) {
val colsOrig = lineOrig.split(",").map(_.trim)
println(2, s"${colsOrig(6)}")
printDistance(s"${colsOrig(6)}", s"${colsMatch(0)}")
}
}
}
}
输出示例:带帮助打印
(1,Jan Rock)
(2,Jem Rog)
Jem Rog -> Jan Rock, Levenshtein: 4, JaroWinkler: 0.7214285714285713, Soundex: true, Metaphone: false
(2,Jan Rock)
Jan Rock -> Jan Rock, Levenshtein: 0, JaroWinkler: 1.0, Soundex: true, Metaphone: true
(2,Jen Rack)
Jen Rack -> Jan Rock, Levenshtein: 2, JaroWinkler: 0.8500000000000001, Soundex: true, Metaphone: true
(2,Susan Rock)
Susan Rock -> Jan Rock, Levenshtein: 3, JaroWinkler: 0.8583333333333334, Soundex: false, Metaphone: false
(1,Susan Rock)
答案 0 :(得分:0)
你的内部循环在第一个外部迭代中消耗整个文件,所以当你从外部文件读取下一行时,内部迭代器已经是空的。
尝试在循环之前阅读整个文件:
val outer = fileNameMatch.getLines()
val inner = fileNameOrig.getLines().toList
for {
lineMatch <- outer
lineOrig <- inner
//...
}
答案 1 :(得分:0)
1)提取名称列表的列(getlines,split,map,s&#34; $ {colsMatch(0)}&#34;等附加到列表中)
var match = List[String]();
match ::= s"${colsMarch(0)}";
println(match.reverse)
2)列表交叉(产品)加入
val f1 = fileNameOrig.getLines().toList
val f2 = fileNameMatch.getLines().toList
implicit class Crossable[X](xs: Traversable[X]) {
def cross[Y](ys: Traversable[Y]) = for { x <- xs; y <- ys } yield (x, y)
}
println(f1 cross f2)
3)迭代列表中的行