Scala迭代两个文件中的行

时间:2016-07-02 10:48:06

标签: scala iteration

我有两个数据集Orig / Match,我想在名称列上进行不同的匹配。迭代仅适用于第一行,而不是第2行中的打印名称列,但不会继续对Orig数据集进行迭代。似乎两个For循环不是正确的方法。 :(

感谢您的帮助。

object poc {

  // similarity methods
  def lv_distance(s1: String, s2: String) = {
    LevenshteinMetric.compare(s1, s2)
  }

  def jv_distance(s1: String, s2: String) = {
    JaroWinklerMetric.compare(s1, s2)
  }

  // phonetic methods
  def mp_distance(s1: String, s2: String) = {
    MetaphoneMetric.compare(s1, s2)
  }

  def sx_distance(s1: String, s2: String) = {
    SoundexMetric.compare(s1, s2)
  }

  // output definition
  def printDistance(s1: String, s2: String) = println("%s -> %s, Levenshtein: %s, JaroWinkler: %s, Soundex: %s, Metaphone: %s"
    .format(s1, s2, lv_distance(s1, s2).get, jv_distance(s1, s2).get, sx_distance(s1, s2).get, mp_distance(s1, s2).get))

  def main(args: Array[String]): Unit = {
    val fileNameOrig = io.Source.fromFile(args(0), "iso-8859-1")
    val fileNameMatch = io.Source.fromFile(args(1), "iso-8859-1")

    for (lineMatch <- fileNameMatch.getLines()) {
      val colsMatch = lineMatch.split(",").map(_.trim)
      println(1, s"${colsMatch(0)}")
      for (lineOrig <- fileNameOrig.getLines()) {
        val colsOrig = lineOrig.split(",").map(_.trim)
        println(2, s"${colsOrig(6)}")
        printDistance(s"${colsOrig(6)}", s"${colsMatch(0)}")
      }
    }
  }
}

输出示例:带帮助打印

(1,Jan Rock)
(2,Jem Rog)
Jem Rog -> Jan Rock, Levenshtein: 4, JaroWinkler: 0.7214285714285713, Soundex: true, Metaphone: false
(2,Jan Rock)
Jan Rock -> Jan Rock, Levenshtein: 0, JaroWinkler: 1.0, Soundex: true, Metaphone: true
(2,Jen Rack)
Jen Rack -> Jan Rock, Levenshtein: 2, JaroWinkler: 0.8500000000000001, Soundex: true, Metaphone: true
(2,Susan Rock)
Susan Rock -> Jan Rock, Levenshtein: 3, JaroWinkler: 0.8583333333333334, Soundex: false, Metaphone: false
(1,Susan Rock)

2 个答案:

答案 0 :(得分:0)

你的内部循环在第一个外部迭代中消耗整个文件,所以当你从外部文件读取下一行时,内部迭代器已经是空的。

尝试在循环之前阅读整个文件:

  val outer = fileNameMatch.getLines()
  val inner =  fileNameOrig.getLines().toList
  for { 
    lineMatch <- outer
    lineOrig <- inner
    //...
  } 

答案 1 :(得分:0)

1)提取名称列表的列(getlines,split,map,s&#34; $ {colsMatch(0)}&#34;等附加到列表中)

var match = List[String]();
match ::= s"${colsMarch(0)}";
println(match.reverse)

2)列表交叉(产品)加入

val f1 = fileNameOrig.getLines().toList
val f2 = fileNameMatch.getLines().toList

implicit class Crossable[X](xs: Traversable[X]) {
  def cross[Y](ys: Traversable[Y]) = for { x <- xs; y <- ys } yield (x, y)
}

println(f1 cross f2)

3)迭代列表中的行