正在尝试:
import com.hadoop.mapreduce.LzoTextInputFormat
import org.apache.hadoop.io.Text
import org.apache.spark.{SparkConf, SparkContext}
def main(args: Array[String]) {
val somefile = "somefile"
val conf = new SparkConf().setAppName("Simple Application").setMaster("local[2]")
conf.set("io.compression.codecs","com.hadoop.compression.lzo.LzopCodec")
conf.set("io.compression.codec.lzo.class", "com.hadoop.compression.lzo.LzoCodec")
val sc = new SparkContext(conf)
val result = sc.hadoopFile[Text, Text, LzoTextInputFormat](someFile)
println(result.first())
}
但我得到
Error:(22, 23) type arguments [org.apache.hadoop.io.Text,org.apache.hadoop.io.Text,com.hadoop.mapreduce.LzoTextInputFormat] conform to the bounds of none of the overloaded alternatives of
value hadoopFile: [K, V, F <: org.apache.hadoop.mapred.InputFormat[K,V]](path: String)(implicit km: scala.reflect.ClassTag[K], implicit vm: scala.reflect.ClassTag[V], implicit fm: scala.reflect.ClassTag[F])org.apache.spark.rdd.RDD[(K, V)] <and> [K, V, F <: org.apache.hadoop.mapred.InputFormat[K,V]](path: String, minPartitions: Int)(implicit km: scala.reflect.ClassTag[K], implicit vm: scala.reflect.ClassTag[V], implicit fm: scala.reflect.ClassTag[F])org.apache.spark.rdd.RDD[(K, V)]
val result = sc.hadoopFile[Text, Text, LzoTextInputFormat](someFile)
^
我也试过以下
val result = sc.hadoopFile[LongWritable, Text, TextInputFormat](someFile)
println(result.first())
然后我得到:
15/07/15 18:36:23 INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
15/07/15 18:36:23 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
15/07/15 18:36:23 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.io.NotSerializableException: org.apache.hadoop.io.LongWritable
Serialization stack:
- object not serializable (class: org.apache.hadoop.io.LongWritable, value: 0)
- field (class: scala.Tuple2, name: _1, type: class java.lang.Object)
- object (class scala.Tuple2, (0,SEQ!org.apache.hadoop.io.NullWritableorg.apache.hadoop.io.Text#com.hadoop.compression.lzo.LzoCodec � 5�))
- element of array (index: 0)
- array (class [Lscala.Tuple2;, size 1)
at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:38)
at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47)
帮助! :/