我正在使用GraphX和SparkSQL,并尝试在图形节点中创建DataFrame(数据集)。要创建一个DataFrame,我需要SparkSession(spark.createDataFrame(rows,schema))。我试试,我得到一个错误。这是我的代码:
SparkSession spark = SparkSession.builder()
.master("spark://home:7077")
.appName("testgraph")
.getOrCreate();
JavaSparkContext sc = new JavaSparkContext(spark.sparkContext());
//read tree File
JavaRDD<String> tree_file = sc.textFile(args[1]);
JavaPairRDD<String[],Long> node_pair = tree_file.map(l-> l.split(" ")).zipWithIndex();
//Create vertex
RDD<Tuple2<Object, Tuple2<Dataset<Row>,Clauses>>> verteces = node_pair.map(t-> {
List<StructField> fields = new ArrayList<StructField>();
List<Row> rows = new ArrayList<>();
String[] vars = Arrays.copyOfRange(t._1(), 2,t._1().length);
for (int i = 0; i < vars.length; i++) {
fields.add(DataTypes.createStructField(vars[i], DataTypes.BooleanType, true));
}
StructType schema = DataTypes.createStructType(fields);
Dataset<Row> ds = spark.createDataFrame(rows,schema);
return new Tuple2<>((Object)(t._2+1),ds);
}).rdd();
这是我遇到的错误:
16/08/23 15:25:36 WARN TaskSetManager: Lost task 0.0 in stage 2.0 (TID 3, 192.168.1.5): java.lang.NullPointerException
at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:112)
at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:110)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:63)
at org.apache.spark.sql.SparkSession.createDataFrame(SparkSession.scala:328)
at Main.lambda$main$e7daa47c$1(Main.java:62)
at org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1.apply(JavaPairRDD.scala:1028)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:148)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
at org.apache.spark.scheduler.Task.run(Task.scala:85)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
我还尝试使用:
在map()中获取会话SparkSession ss = SparkSession.builder()
.master("spark://home:7077")
.appName("testgraph")
.getOrCreate();
我也得到一个错误:
16/08/23 15:00:29 WARN TaskSetManager: Lost task 0.0 in stage 4.0 (TID 7, 192.168.1.5): java.util.NoSuchElementException: None.get
at scala.None$.get(Option.scala:347)
at scala.None$.get(Option.scala:345)
at org.apache.spark.storage.BlockInfoManager.releaseAllLocksForTask(BlockInfoManager.scala:343)
at org.apache.spark.storage.BlockManager.releaseAllLocksForTask(BlockManager.scala:644)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:281)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
我希望有人可以帮助我。我找不到解决方案。 谢谢!