在worker中调用SparkSession(Spark-SQL,Java)

时间:2016-08-23 13:30:21

标签: java apache-spark apache-spark-sql spark-graphx

我正在使用GraphX和SparkSQL,并尝试在图形节点中创建DataFrame(数据集)。要创建一个DataFrame,我需要SparkSession(spark.createDataFrame(rows,schema))。我试试,我得到一个错误。这是我的代码:

SparkSession spark = SparkSession.builder()
            .master("spark://home:7077")
            .appName("testgraph")
            .getOrCreate();

JavaSparkContext sc = new JavaSparkContext(spark.sparkContext());

//read tree File
JavaRDD<String> tree_file = sc.textFile(args[1]);

JavaPairRDD<String[],Long> node_pair = tree_file.map(l-> l.split(" ")).zipWithIndex();

//Create vertex
RDD<Tuple2<Object, Tuple2<Dataset<Row>,Clauses>>> verteces = node_pair.map(t-> {

    List<StructField> fields = new ArrayList<StructField>();
    List<Row> rows = new ArrayList<>();
    String[] vars = Arrays.copyOfRange(t._1(), 2,t._1().length);

    for (int i = 0; i < vars.length; i++) {
       fields.add(DataTypes.createStructField(vars[i], DataTypes.BooleanType, true));
    }
    StructType schema = DataTypes.createStructType(fields);

    Dataset<Row> ds = spark.createDataFrame(rows,schema);
    return new Tuple2<>((Object)(t._2+1),ds);

}).rdd();

这是我遇到的错误:

16/08/23 15:25:36 WARN TaskSetManager: Lost task 0.0 in stage 2.0 (TID 3, 192.168.1.5): java.lang.NullPointerException
at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:112)
at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:110)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:63)
at org.apache.spark.sql.SparkSession.createDataFrame(SparkSession.scala:328)
at Main.lambda$main$e7daa47c$1(Main.java:62)
at org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1.apply(JavaPairRDD.scala:1028)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:148)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
at org.apache.spark.scheduler.Task.run(Task.scala:85)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

我还尝试使用:

在map()中获取会话
SparkSession ss = SparkSession.builder()
                .master("spark://home:7077")
                .appName("testgraph")
                .getOrCreate();

我也得到一个错误:

16/08/23 15:00:29 WARN TaskSetManager: Lost task 0.0 in stage 4.0 (TID 7, 192.168.1.5): java.util.NoSuchElementException: None.get
at scala.None$.get(Option.scala:347)
at scala.None$.get(Option.scala:345)
at org.apache.spark.storage.BlockInfoManager.releaseAllLocksForTask(BlockInfoManager.scala:343)
at org.apache.spark.storage.BlockManager.releaseAllLocksForTask(BlockManager.scala:644)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:281)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

我希望有人可以帮助我。我找不到解决方案。 谢谢!

0 个答案:

没有答案