我已经创建了一个JSONArray并为此创建了RDD。当我试图映射sqlContext.jsonRDD(rdd)时,我收到以下错误:
Error: application failed with exception
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 5, esu3v148.federated.fds): java.lang.ClassCastException: org.json.simple.JSONObject cannot be cast to java.lang.String
at org.apache.spark.sql.json.JsonRDD$$anonfun$parseJson$1$$anonfun$apply$2.apply(JsonRDD.scala:307)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at org.apache.spark.rdd.RDD$$anonfun$19.apply(RDD.scala:885)
at org.apache.spark.rdd.RDD$$anonfun$19.apply(RDD.scala:884)
at org.apache.spark.SparkContext$$anonfun$32.apply(SparkContext.scala:1534)
at org.apache.spark.SparkContext$$anonfun$32.apply(SparkContext.scala:1534)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
创建了JsonArray并在spark中使用,详情如下:
JSONArray jsonResultArray = new JSONArray();
SparkConf sparkConf = new SparkConf().setAppName("HBaseTest");
JavaSparkContext sc = new JavaSparkContext(sparkConf);
JavaStreamingContext ssc = new JavaStreamingContext(sc, Durations.seconds(60));
SQLContext sqlContext = new SQLContext(sc);
if (!jsonResultArray.isEmpty()) {
@SuppressWarnings("unchecked")
//JavaRDD<String> rdd = sc.parallelize(jsonResultArray);
DataFrame input = sqlContext.jsonRDD(sc.parallelize(jsonResultArray));
请帮帮我,如何解决这个问题 感谢。
答案 0 :(得分:1)
sqlContext.jsonRDD
expects JavaRDD<java.lang.String>
类型的参数。
JSONArray是org.json.simple.JSONObject
的列表,因此sc.parallelize(jsonResultArray)
会创建JavaRDD<JSONObject>
- 因此在将此jsonRDD
传递给org.json.simple.JSONArray
时会引发异常。这通常是编译时错误,但编译器误导了List
扩展泛型 final JavaRDD<JSONObject> jsonObjectRDD = sc.parallelize((List<JSONObject>) jsonResultArray);
final JavaRDD<String> jsonStringRDD = jsonObjectRDD.map(new Function<JSONObject, String>() {
@Override
public String call(JSONObject v) throws Exception {
return v.toJSONString();
}
});
DataFrame input = sqlContext.jsonRDD(jsonStringRDD);
(没有显式类型)这一事实,因此这种不匹配只是在运行时检测到。
如果你真的必须使用JSONArray,你必须在创建RDD之前或之后将它映射到字符串,例如:
import java.util.BitSet;
public class Test_Main {
public static void main(String[] args) {
// TODO code application logic here
BitSet test=new BitSet();
test.set(0);
test.set(1,false);
test.set(2,false);
test.set(3,false);
//test.set(4);
// String S="1000";
// BitSet test=Binary.toBitSet(S);
String testString=Binary.toString(test);
System.out.println("Result is:"+testString);
}
}