Question

我正在尝试将JavaRDD<PageRankCase>类转换为DataFrame，以便稍后将其保存到parquet文件中，但代码执行会在createDataFrame函数处停止call，当我处理异常时它给出null。这是我的班级。

public static class PageRanksCase implements Serializable{

    private String node;
    private Double importance; 


    public void setNode(String node) { this.node = node; }
    public void setImportance(Double importance) { this.importance = importance; }

    public String getNode(String node) { return this.node; }
    public Double getImportance(String node) { return this.importance; }
}

这是我试图将类转换为DataFrame的代码。

try{
    JavaRDD<PageRanksCase> finalData = GetTopNNodes(pairedrdd);
    System.out.println("coming here");
    DataFrame finalFrame = Service.sqlCtx().createDataFrame(finalData,PageRanksCase.class);
    System.out.println("coming here too");
    finalFrame.write().parquet(rdfanalyzer.spark.Configuration.storage() + "sib200PageRank.parquet");
}
catch(Exception e)
{ 
  System.out.println(e.getMessage());  // gives null here.
}

它会打印coming here，但它不会打印coming here too，也不会出现任何错误。以下是我将JavaPairRDD转换为JavaRDD的方法。

public static JavaRDD<PageRanksCase> GetTopNNodes(JavaPairRDD<String,Double> pairedrdd){

    return pairedrdd.map(new Function<Tuple2<String,Double>, PageRanksCase>() {

        @Override
        public PageRanksCase call(Tuple2<String, Double> line) throws Exception {
            PageRanksCase pgrank  = new PageRanksCase();
            pgrank.setImportance(line._2);
            pgrank.setNode(line._1());
            return pgrank;
        }
    });
}

有人知道这里有什么想法吗？

Answer 1

我最后使用Spark website上提到的示例重新实现了代码。

Spark JavaRdd到Dataframe转换代码会停止而不会出错

1 个答案: