Question

我有一个RDD个文本文件我要解析。我通过在函数上映射函数来实现这一点，函数返回Either[String, Book]，其中Book是解析产生的结构化类型，或String是无法解析的文本。结果是RDD[Either[String, Book]]。我希望有RDD[String]和RDD[Book]，因为前者应该被记录并丢弃，而后者应该被更多地处理。

我的分配器是：

implicit class EitherRDDOps[L, R](rdd: RDD[Either[L, R]]) {
    def split(): (RDD[L], RDD[R]) = {
        // toSeq on Either provides empty Seq for Right and one-element Seq for Left
        val left: RDD[L] = rdd.flatMap(_.swap.toSeq)
        val right: RDD[R] = rdd.flatMap(_.toSeq)
        (left, right)
    }
}

分割器名为input.map(parseBook).cache.split，其中input为RDD[String]，parseBook为(String) => Either[String, Book]。

我收到以下编译错误：

value toSeq is not a member of Product with Serializable with scala.util.Either
       val left: RDD[L] = rdd.flatMap(_.swap.toSeq)
                                     ^

value toSeq is not a member of Either[L,R]
       val right: RDD[R] = rdd.flatMap(_.toSeq)
                                 ^

type mismatch;
  found   : org.apache.spark.rdd.RDD[Nothing]
  required: org.apache.spark.rdd.RDD[L]
 Note: Nothing <: L, but class RDD is invariant in type T.
 You may wish to define T as +T instead. (SLS 4.5)
       (left, right)
        ^

  found   : org.apache.spark.rdd.RDD[Nothing]
  required: org.apache.spark.rdd.RDD[R]
 Note: Nothing <: R, but class RDD is invariant in type T.
 You may wish to define T as +T instead. (SLS 4.5)
       (left, right)
              ^

但the documentation在toSeq上明确列出了Either方法。任何的想法？我应该以不同的方式解决这个问题吗？

Answer 1

好像你使用的是稍微旧版本的Scala，可能是2.11.x或类似的东西。 Either最近已更新，旧版本可能没有toSeq：link to 2.11.8 documentation。

请改为尝试：

val left = rdd.filter(_.isRight).map(_.right.get)
val right = rdd.filter(_.isLeft).map(_.left.get)

＆＃34;值toSeq不是具有Serializable with scala.util的产品的成员。＆＃34;？

1 个答案:

＆＃34;值toSeq不是具有Seri​​alizable with scala.util的产品的成员。＆＃34;？

1 个答案:

＆＃34;值toSeq不是具有Serializable with scala.util的产品的成员。＆＃34;？