从JSON RDD读取数据时获取异常

时间:2018-09-02 14:27:29

标签: json apache-spark

在按照以下代码段运行时,出现异常。有人可以告诉我这段代码有什么问题吗?

JavaDStream<String> newDstream= newlines.window(Durations.seconds(300),Durations.seconds(120));

newDstream.print(); //This prints perfectly
newDstream.foreachRDD(new VoidFunction<JavaRDD<String>>() {
    private static final long serialVersionUID = 1L;
    @Override
    public void call(JavaRDD<String> rdd) throws Exception {
        //Logger.debug("RDD received: {}",rdd.collect());
        Dataset<Row> df = sqlcontext.read().option("multiline", true).json(rdd);
        df.printSchema();
        df.show();
    }
});

下面是df.printSchema的输出

root
 |-- priceData: struct (nullable = true)
 |    |-- close: string (nullable = true)
 |    |-- high: string (nullable = true)
 |    |-- low: string (nullable = true)
 |    |-- open: string (nullable = true)
 |    |-- volume: string (nullable = true)
 |-- symbol: string (nullable = true)
 |-- timestamp: string (nullable = true)

这是一个例外:

java.lang.ClassCastException: org.apache.spark.util.SerializableConfiguration cannot be cast 
    to [B at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:81) at
org.apache.spark.scheduler.Task.run(Task.scala:108) at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335) at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at 
java.lang.Thread.run(Thread.java:748)

0 个答案:

没有答案