序列理解中的多重收益?

时间:2010-07-18 16:10:51

标签: scala yield list-comprehension

我正在尝试学习Scala并尝试编写一个序列理解,从序列中提取unigrams,bigrams和trigrams。例如,[1,2,3,4]应该转换为(不是Scala语法)

[1; _,1; _,_,1; 2; 1,2; _,1,2; 3; 2,3; 1,2,3; 4; 3,4; 2,3,4]

在Scala 2.8中,我尝试了以下内容:

def trigrams(tokens : Seq[T]) = {
  var t1 : Option[T] = None
  var t2 : Option[T] = None
  for (t3 <- tokens) {
    yield t3
    yield (t2,t3)
    yield (t1,t2,Some(t3))
    t1 = t2
    t2 = t3
  }
}

但这不能编译为apparentlyyield中只允许一个for - 理解(也没有块语句)。有没有其他优雅的方法来获得相同的行为,只有一次传递数据?

5 个答案:

答案 0 :(得分:7)

for循环中不能有多个yield,因为for循环是map(或flatMap)操作的语法糖:

for (i <- collection) yield( func(i) )

转换为

collection map {i => func(i)}

根本没有收益

for (i <- collection) func(i)

转换为

collection foreach {i => func(i)}

因此for循环的整个主体变为单个闭包,yield关键字的存在决定了对集合调用的函数是map还是{{ 1}}(或foreach)。由于这种翻译,禁止以下内容:

  1. 使用flatMap旁边的命令性语句来确定将会产生什么。
  2. 使用多次收益
  3. (更不用说你提出的verison将返回yield,因为元组和1-gram都是不同的类型。你可能想要获得List[Any]

    请尝试以下方法(将n-gram按其出现的顺序排列):

    List[List[Int]]

    val basis = List(1,2,3,4)
    val slidingIterators = 1 to 4 map (basis sliding _)
    
    for {onegram <- basis
         ngram <- slidingIterators if ngram.hasNext}
         yield (ngram.next)
    

    如果您希望n-gram为长度顺序,请尝试:

    val basis = List(1,2,3,4)
    val slidingIterators = 1 to 4 map (basis sliding _)
    val first=slidingIterators head
    val buf=new ListBuffer[List[Int]]
    
    while (first.hasNext)
       for (i <- slidingIterators)
          if (i.hasNext)
             buf += i.next
    

答案 1 :(得分:2)

scala> val basis = List(1, 2, 3, 4)
basis: List[Int] = List(1, 2, 3, 4)

scala> val nGrams = (basis sliding 1).toList ::: (basis sliding 2).toList ::: (basis sliding 3).toList
nGrams: List[List[Int]] = ...

scala> nGrams foreach (println _)
List(1)
List(2)
List(3)
List(4)
List(1, 2)
List(2, 3)
List(3, 4)
List(1, 2, 3)
List(2, 3, 4)

答案 2 :(得分:1)

我想我应该多考虑一下。

def trigrams(tokens : Seq[T]) : Seq[(Option[T],Option[T],T)] = {
  var t1 : Option[T] = None
  var t2 : Option[T] = None
  for (t3 <- tokens)
    yield {
      val tri = (t1,t2,t3)
      t1 = t2
      t2 = Some(t3)
      tri
    }
}

然后从三卦中提取unigrams和bigrams。但有人可以向我解释为什么不允许“多收益”,以及是否有其他方法可以实现它们的效果?

答案 3 :(得分:1)

val basis = List(1, 2, 3, 4)
val nGrams = basis.map(x => (x)) ::: (for (a <- basis; b <- basis) yield (a, b)) ::: (for (a <- basis; b <- basis; c <- basis) yield (a, b, c))
nGrams: List[Any] = ...
nGrams foreach (println(_))
1
2
3
4
(1,1)
(1,2)
(1,3)
(1,4)
(2,1)
(2,2)
(2,3)
(2,4)
(3,1)
(3,2)
(3,3)
(3,4)
(4,1)
(4,2)
(4,3)
(4,4)
(1,1,1)
(1,1,2)
(1,1,3)
(1,1,4)
(1,2,1)
(1,2,2)
(1,2,3)
(1,2,4)
(1,3,1)
(1,3,2)
(1,3,3)
(1,3,4)
(1,4,1)
(1,4,2)
(1,4,3)
(1,4,4)
(2,1,1)
(2,1,2)
(2,1,3)
(2,1,4)
(2,2,1)
(2,2,2)
(2,2,3)
(2,2,4)
(2,3,1)
(2,3,2)
(2,3,3)
(2,3,4)
(2,4,1)
(2,4,2)
(2,4,3)
(2,4,4)
(3,1,1)
(3,1,2)
(3,1,3)
(3,1,4)
(3,2,1)
(3,2,2)
(3,2,3)
(3,2,4)
(3,3,1)
(3,3,2)
(3,3,3)
(3,3,4)
(3,4,1)
(3,4,2)
(3,4,3)
(3,4,4)
(4,1,1)
(4,1,2)
(4,1,3)
(4,1,4)
(4,2,1)
(4,2,2)
(4,2,3)
(4,2,4)
(4,3,1)
(4,3,2)
(4,3,3)
(4,3,4)
(4,4,1)
(4,4,2)
(4,4,3)
(4,4,4)

答案 4 :(得分:1)

您可以尝试不带作业的功能版本:

def trigrams[T](tokens : Seq[T]) = {
  val s1 = tokens.map { Some(_) }
  val s2 = None +: s1
  val s3 = None +: s2
  s1 zip s2 zip s3 map {
    case ((t1, t2), t3) => (List(t1), List(t1, t2), List(t1, t2, t3))
  }
}