Question

我在spark中使用mongo-hadoop客户端（r1.5.2）从此链接读取mongoDB和bson中的数据：https://github.com/mongodb/mongo-hadoop/wiki/Spark-Usage。到目前为止，我可以毫无问题地从mongoDB中读取。但是，bson配置甚至无法编译。请帮忙。

我在scala中的代码：

dataConfig.set("mapred.input.dir", "path.bson")

    val documents = sc.newAPIHadoopRDD(
      dataConfig,                
      classOf[BSONFileInputFormat],  
      classOf[Object],            
      classOf[BSONObject])

错误：

Error:(56, 24) inferred type arguments [Object,org.bson.BSONObject,com.mongodb.hadoop.mapred.BSONFileInputFormat] do not conform to method newAPIHadoopRDD's type parameter bounds [K,V,F <: org.apache.hadoop.mapreduce.InputFormat[K,V]]
    val documents = sc.newAPIHadoopRDD(
                       ^

Answer 1

我找到了解决方案！问题似乎是由InputFormat的泛型

引起的

newAPIHadoopRDD要求输入格式为

F <: org.apache.hadoop.mapreduce.InputFormat[K,V]

虽然BSONFileInputFormat扩展了扩展InputFormat [K，V]的FileInputFormat [K，V]，但它并没有将K，V泛型指定为Object和BSONObject。（实际上在BSONFileInputFormat中没有提到K，V泛型，这个类真的可以编译吗？）。

无论如何，解决方案是将BSONFileInputFormat转换为InputFormat的子类，定义了K和V：

val documents = sc.newAPIHadoopRDD(
  dataConfig,                
  classOf[BSONFileInputFormat].asSubclass(classOf[org.apache.hadoop.mapreduce.lib.input.FileInputFormat[Object, BSONObject]]),  
  classOf[Object],            
  classOf[BSONObject])

现在它没有任何问题：）

Spark无法使用mongo-hadoop-connector的BSONFileInputFormat编译newAPIHadoopRDD

1 个答案: