Spark Custom DataSource

时间:2017-03-21 21:32:05

标签: apache-spark

所以我正在编写一个自定义数据源,用于从hbase读取一些数据。一切都很好,直到ExistingRDD.rowToRowRdd接到电话。然后它尝试从我的GenericRowWithSchema中获取模式。它失败我没有想法为什么......我看到过去有类似问题的人。我在Spark 1.6.3上,我的架构固定为:

    StructType(Seq(StructField("Date", LongType),
        StructField("Device", StringType),
        StructField("Tag", StringType),
        StructField("TagValue", DoubleType))
    )

Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): scala.MatchError: 1451610000 (of class java.lang.Long)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$StringConverter$.toCatalystImpl(CatalystTypeConverters.scala:295)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$StringConverter$.toCatalystImpl(CatalystTypeConverters.scala:294)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:102)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToCatalystConverter$2.apply(CatalystTypeConverters.scala:401)
at org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$2.apply(ExistingRDD.scala:59)
at org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$2.apply(ExistingRDD.scala:56)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:389)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:88)
at org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:86)

任何想法?

1 个答案:

答案 0 :(得分:1)

所以我找出原因。在覆盖def buildScan(requiredColumns:Array [String],filters:Array [Filter]):RDD [Row] = {....} 你的行需要具有确切的requiredColumns