使用" newAPIHadoopFile"时出错API

时间:2016-10-17 16:47:43

标签: scala hadoop apache-spark

我正在编写以下代码,使用newAPIHadoopFile API将文件加载到Spark中。

val lines = sc.newAPIHadoopFile("new_actress.list",classOf[TextInputFormat],classOf[Text],classOf[Text])

但是我收到以下错误:

scala> val lines = sc.newAPIHadoopFile("new_actress.list",classOf[TextInputFormat],classOf[Text],classOf[Text])
<console>:34: error: inferred type arguments [org.apache.hadoop.io.Text,org.apache.hadoop.io.Text,org.apache.hadoop.mapred.TextInputFormat] do not conform to method newAPIHadoopFile's type parameter bounds [K,V,F <: org.apache.hadoop.mapreduce.InputFormat[K,V]]
 val lines = sc.newAPIHadoopFile("new_actress.list",classOf[TextInputFormat],classOf[Text],classOf[Text])
                ^
<console>:34: error: type mismatch;
found   : Class[org.apache.hadoop.mapred.TextInputFormat](classOf[org.apache.hadoop.mapred.TextInputFormat])
required: Class[F]
val lines = sc.newAPIHadoopFile("new_actress.list",classOf[TextInputFormat],classOf[Text],classOf[Text])
                                                          ^
<console>:34: error: type mismatch;
found   : Class[org.apache.hadoop.io.Text](classOf[org.apache.hadoop.io.Text])
required: Class[K]
val lines = sc.newAPIHadoopFile("new_actress.list",classOf[TextInputFormat],classOf[Text],classOf[Text])
                                                                                   ^
<console>:34: error: type mismatch;
found   : Class[org.apache.hadoop.io.Text](classOf[org.apache.hadoop.io.Text])
required: Class[V]
val lines = sc.newAPIHadoopFile("new_actress.list",classOf[TextInputFormat],classOf[Text],classOf[Text])
                                                                                                 ^

我在代码中做错了什么?

1 个答案:

答案 0 :(得分:2)

TextInputFormat需要<LongWritable,Text>

注意:专注于**InputFormat

中的扩展部分
@InterfaceAudience.Public
@InterfaceStability.Stable
public class TextInputFormat
extends FileInputFormat<LongWritable,Text>

这意味着您无法将FileInputFormat的两种类型都设置为Text。如果您想使用FileInputFormat,您需要执行以下操作:

您可以尝试:

import org.apache.hadoop.mapreduce.lib.input.TextInputFormat
import org.apache.hadoop.io.Text
import org.apache.hadoop.io.LongWritable
val lines = sc.newAPIHadoopFile("test.csv", classOf[TextInputFormat], classOf[LongWritable], classOf[Text])

但如果您仍想将这两种类型用作Text ,则可以使用KeyValueTextInputFormat定义为:

@InterfaceAudience.Public @InterfaceStability.Stable public class
KeyValueTextInputFormat extends FileInputFormat<Text,Text>

您可以尝试:

import org.apache.hadoop.mapreduce.lib.input.KeyValueTextInputFormat
import org.apache.hadoop.io.Text
val lines = sc.newAPIHadoopFile("test.csv", classOf[KeyValueTextInputFormat], classOf[Text], classOf[Text])