找不到:在spark scala

时间:2018-03-01 12:34:16

标签: scala hadoop apache-spark

我需要根据密钥输出分区。我尝试使用MultipleTextOutputFormat

我发现了https://stackoverflow.com/a/26051042/6561443

但是当我试图在spark-shell中做同样的事情时,我收到了错误。

scala> import org.apache.hadoop.io.NullWritable
import org.apache.hadoop.io.NullWritable

scala> import org.apache.spark._
import org.apache.spark._

scala> import org.apache.spark.SparkContext._
import org.apache.spark.SparkContext._

scala> import org.apache.hadoop.mapred.lib.MultipleTextOutputFormat
import org.apache.hadoop.mapred.lib.MultipleTextOutputFormat

scala> class RDDMultipleTextOutputFormat extends MultipleTextOutputFormat[Any, Any] {
        override def generateActualKey(key: Any, value: Any): Any =
          NullWritable.get()
             override def generateFileNameForKeyValue(key: Any, value: Any, name: String): String =
          key.asInstanceOf[String]
      }

<console>:11: error: not found: type MultipleTextOutputFormat
       class RDDMultipleTextOutputFormat extends MultipleTextOutputFormat[Any, Any] {
                                                 ^
<console>:13: error: not found: value NullWritable
           NullWritable.get()

如果我使用spark-submit提交此应用程序,我将

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1

我在这里遗漏了什么吗?它不适用于spark-shell?

1 个答案:

答案 0 :(得分:0)

我遇到了同样的问题。 尝试写

class RDDMultipleTextOutputFormat extends org.apache.hadoop.mapred.lib.MultipleTextOutputFormat[Any, Any] {

相反。 为我工作。不知道为什么。