请帮帮我。
我在s3中有10个(表数据)镶木地板文件。
我正在阅读并存储为数据集,然后注册为临时表。
一张表驱动整个流程,所以我在下面做。(当我从
触发查询时代码库:
SparkSession spark = SparkSession.builder().appName("Test").getOrCreate();
Dataset<Row> citationDF = spark.read().parquet("s3://...")
...
...
citationDF.createOrReplaceTempView("citation");
...
....
cit_num.javaRDD().foreachPartition(new VoidFunction<Iterator<Row>>()
{
/**
*
*/
private static final long serialVersionUID = 1L;
@Override
public void call(Iterator<Row> iter)
{
while (iter.hasNext())
{
Row record=iter.next();
int citation_num=record.getInt(0);
String ci_query="select queries ....";//(i can execute this query outside of foreach)
System.out.println("citation num:"+citation_num+" count:"+spark.sql(ci_query).count());
accum.add(1);
System.out.println("accumulator count:"+accum);
}
}
});
错误:
16/10/24 09:08:12 WARN TaskSetManager: Lost task 1.0 in stage 30.0 (TID 83, ip-10-95-36-172.dev): java.lang.NullPointerException
at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:112)
at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:110)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:582)
at com.elsevier.datasearch.CitationTest$1.call(CitationTest.java:124)
at com.elsevier.datasearch.CitationTest$1.call(CitationTest.java:1)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:218)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:218)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:883)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:883)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1897)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1897)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:85)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)