"值toSeq不是具有Seri​​alizable with scala.util的产品的成员。"?

时间:2018-04-01 16:29:40

标签: scala apache-spark either

我有一个RDD个文本文件我要解析。我通过在函数上映射函数来实现这一点,函数返回Either[String, Book],其中Book是解析产生的结构化类型,或String是无法解析的文本。结果是RDD[Either[String, Book]]。我希望有RDD[String]RDD[Book],因为前者应该被记录并丢弃,而后者应该被更多地处理。

我的分配器是:

implicit class EitherRDDOps[L, R](rdd: RDD[Either[L, R]]) {
    def split(): (RDD[L], RDD[R]) = {
        // toSeq on Either provides empty Seq for Right and one-element Seq for Left
        val left: RDD[L] = rdd.flatMap(_.swap.toSeq)
        val right: RDD[R] = rdd.flatMap(_.toSeq)
        (left, right)
    }
}

分割器名为input.map(parseBook).cache.split,其中inputRDD[String]parseBook(String) => Either[String, Book]

我收到以下编译错误:

value toSeq is not a member of Product with Serializable with scala.util.Either
       val left: RDD[L] = rdd.flatMap(_.swap.toSeq)
                                     ^

value toSeq is not a member of Either[L,R]
       val right: RDD[R] = rdd.flatMap(_.toSeq)
                                 ^

type mismatch;
  found   : org.apache.spark.rdd.RDD[Nothing]
  required: org.apache.spark.rdd.RDD[L]
 Note: Nothing <: L, but class RDD is invariant in type T.
 You may wish to define T as +T instead. (SLS 4.5)
       (left, right)
        ^

  found   : org.apache.spark.rdd.RDD[Nothing]
  required: org.apache.spark.rdd.RDD[R]
 Note: Nothing <: R, but class RDD is invariant in type T.
 You may wish to define T as +T instead. (SLS 4.5)
       (left, right)
              ^

the documentationtoSeq上明确列出了Either方法。任何的想法?我应该以不同的方式解决这个问题吗?

1 个答案:

答案 0 :(得分:3)

好像你使用的是稍微旧版本的Scala,可能是2.11.x或类似的东西。 Either最近已更新,旧版本可能没有toSeqlink to 2.11.8 documentation

请改为尝试:

val left = rdd.filter(_.isRight).map(_.right.get)
val right = rdd.filter(_.isLeft).map(_.left.get)