Spark SQL错误读取JSON文件:java.lang.ClassNotFoundException:scala.collection.GenTraversableOnce $ class

时间:2017-03-30 07:21:12

标签: json apache-spark apache-spark-sql

我正在尝试使用Java中的Spark SQL读取JSON文件。 这是我的代码

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.sql.DataFrame;
import org.apache.spark.sql.SQLContext;
...
JavaSparkContext jsc = new JavaSparkContext(sparkConf);
SQLContext sqlContext = new SQLContext(jsc);
DataFrame df = sqlContext.jsonFile("~/test.json");
df.printSchema();
df.registerTempTable("test");
...

我制作了简单的JSON" test.json",以简化:

{
   "name": "myname"
}

当我尝试运行代码时,会出现错误消息:

efg
17/03/30 10:02:26 INFO BlockManagerMasterEndpoint: Registering block manager 10.6.86.82:36824 with 1948.2 MB RAM, BlockManagerId(driver, 10.6.86.82, 36824)
17/03/30 10:02:26 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.6.86.82, 36824)
17/03/30 10:02:26 INFO StandaloneSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
Exception in thread "main" java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce$class
at org.apache.spark.sql.sources.CaseInsensitiveMap.<init>(ddl.scala:344)
at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:219)
at org.apache.spark.sql.SQLContext.load(SQLContext.scala:697)
at org.apache.spark.sql.SQLContext.jsonFile(SQLContext.scala:572)
at org.apache.spark.sql.SQLContext.jsonFile(SQLContext.scala:553)
at sugi.kau.sparkonjava.SparkSQL.main(SparkSQL.java:32)
Caused by: java.lang.ClassNotFoundException: scala.collection.GenTraversableOnce$class
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 6 more
17/03/30 10:02:26 INFO SparkContext: Invoking stop() from shutdown hook
...

感谢

1 个答案:

答案 0 :(得分:1)

在函数jsonFile(String path)的docs spark中: 加载JSON文件(每行一个对象),将结果作为DataFrame返回。 (注意,jsonFile被read()。json())

取代

所以你应该每行有一个对象,你的源文件应该是这样的:

  {"name": "myname"}
  {"name": "myname2"}
  .....