private static JavaPairRDD<Integer, Result> getCompanyDataRDD(JavaSparkContext sc) throws IOException {
return sc.newAPIHadoopRDD(companyDAO.getCompnayDataConfiguration(), TableInputFormat.class, ImmutableBytesWritable.class,
Result.class).mapToPair(new PairFunction<Tuple2<ImmutableBytesWritable, Result>, Integer, Result>() {
public Tuple2<Integer, Result> call(Tuple2<ImmutableBytesWritable, Result> t) throws Exception {
System.out.println("In getCompanyDataRDD"+t._2);
String cknid = Bytes.toString(t._1.get());
System.out.println("processing cknids is:"+cknid);
Integer cknidInt = Integer.parseInt(cknid);
Tuple2<Integer, Result> returnTuple = new Tuple2<Integer, Result>(cknidInt, t._2);
return returnTuple;
}
});
}
我正在mapToPair中对表fetchint进行扫描,结果不可序列化:org.apache.hadoop.hbase.client.Result
答案 0 :(得分:2)
我遇到了Result的Not serialization异常问题。 我解决了
请试试这个。
conf.set(“spark.serializer”,“org.apache.spark.serializer.KryoSerializer”) conf.registerKryoClasses(阵列(classOf [org.apache.hadoop.hbase.client.Result]))
并尝试使用MEMORY_AND_DISK_SER保留。
请让我知道这对你有用。