如何阅读scala spark文本lzo seq?

时间:2015-07-15 15:18:09

标签: scala hadoop apache-spark

正在尝试:

import com.hadoop.mapreduce.LzoTextInputFormat
import org.apache.hadoop.io.Text
import org.apache.spark.{SparkConf, SparkContext}

def main(args: Array[String]) {
  val somefile = "somefile"
  val conf = new SparkConf().setAppName("Simple Application").setMaster("local[2]")
  conf.set("io.compression.codecs","com.hadoop.compression.lzo.LzopCodec")
  conf.set("io.compression.codec.lzo.class", "com.hadoop.compression.lzo.LzoCodec")
  val sc = new SparkContext(conf)
  val result = sc.hadoopFile[Text, Text, LzoTextInputFormat](someFile)
  println(result.first())
 }

但我得到

Error:(22, 23) type arguments [org.apache.hadoop.io.Text,org.apache.hadoop.io.Text,com.hadoop.mapreduce.LzoTextInputFormat] conform to the bounds of none of the overloaded alternatives of
 value hadoopFile: [K, V, F <: org.apache.hadoop.mapred.InputFormat[K,V]](path: String)(implicit km: scala.reflect.ClassTag[K], implicit vm: scala.reflect.ClassTag[V], implicit fm: scala.reflect.ClassTag[F])org.apache.spark.rdd.RDD[(K, V)] <and> [K, V, F <: org.apache.hadoop.mapred.InputFormat[K,V]](path: String, minPartitions: Int)(implicit km: scala.reflect.ClassTag[K], implicit vm: scala.reflect.ClassTag[V], implicit fm: scala.reflect.ClassTag[F])org.apache.spark.rdd.RDD[(K, V)]
      val result = sc.hadoopFile[Text, Text, LzoTextInputFormat](someFile)
     ^

我也试过以下

  val result = sc.hadoopFile[LongWritable, Text, TextInputFormat](someFile)
  println(result.first())

然后我得到:

15/07/15 18:36:23 INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
15/07/15 18:36:23 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
15/07/15 18:36:23 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.io.NotSerializableException: org.apache.hadoop.io.LongWritable
Serialization stack:
    - object not serializable (class: org.apache.hadoop.io.LongWritable, value: 0)
    - field (class: scala.Tuple2, name: _1, type: class java.lang.Object)
    - object (class scala.Tuple2, (0,SEQ!org.apache.hadoop.io.NullWritableorg.apache.hadoop.io.Text#com.hadoop.compression.lzo.LzoCodec    �    5�))
    - element of array (index: 0)
    - array (class [Lscala.Tuple2;, size 1)
    at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:38)
    at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47)

帮助! :/

0 个答案:

没有答案