如果满足条件,则在列表中将元素与下一个连接

时间:2014-05-25 13:11:59

标签: scala

我正在使用斯坦福NLP将文本分成句子,但它忽略了收缩。

所以这是我得到的句子的一个例子:

List(I, 'd, like, to, fix, this, sentence, because, it, 's, broken)

我的目标是连接收缩的单词,以便结果如下所示:

List(I'd, like, to, fix, this, sentence, because, it's, broken)

在scala中有一种优雅的方式吗?基本上我正在寻找一个表达式,它遍历列表,用下一个元素检查一个元素,如果符合条件则连接并按照我的例子返回结果列表。

3 个答案:

答案 0 :(得分:2)

scala> val l = List("I", "'d", "like", "to fix", "this", "sentence", "because", "it", "'s", "broken")
l: List[String] = List(I, 'd, like, to fix, this, sentence, because, it, 's, broken)

scala> l.reduceRight({(s1,s2) => if (s2.startsWith("'")) s1+s2 else s1+" "+s2})
        .split(" ").toList
res2: List[String] = List(I'd, like, to, fix, this, sentence, because, it's, broken)

请注意,如果列表为空(由于使用reduceRight),这将引发异常。 如果发生这种情况,您可能需要使用foldRightreduceRightOption

答案 1 :(得分:1)

val broken = List("I", "'d", "like", "to", "fix", "this", "sentence", "because", "it", "'s", "broken")
broken.foldLeft(List.empty[String]) { (list, str) => 
  if (str.startsWith("'")) {
    list.init :+ (list.last + str) 
  } else {
    list :+ str
  }
}

(我假设"修复"你的问题中的元素是两个元素而且错误地省略了逗号)

答案 2 :(得分:1)

一种扩展已接受答案的方法,用于处理ca, n't

等案例
implicit class StanfordNLPConcat(val words: List[String]) extends AnyVal {
  def SNLPConcat() = {
    val sep = "#"
    words.reduce{ (a,v) => if (v.contains("'")) a+v else a+sep+v }.split(sep).toList
  }
}

val words = List("I", "'d", "like", "to", "fix", "this", "sentence", "because", "it", "'s", "broken")

等等

words.SNLPConcat()
res:  List[String] = List(I'd, like, to, fix, this, sentence, because, it's, broken)

此外,

List("It", "ca", "n't", "be", "wrong").SNLPConcat()
res: List[String] = List(It, can't, be, wrong)