在json数据源中找到一个缺少json的json

时间:2016-06-28 05:55:51

标签: json scala apache-spark apache-spark-sql spark-dataframe

我正在尝试使用以下代码将示例json文件读入SqlContext,但它失败并出现数据源错误。

val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val path = "C:\\samplepath\\sample.json"
val jsondata = sqlContext.read.json(path)
  

java.lang.ClassNotFoundException:无法找到数据源:json。   请在http://spark-packages.org找到套餐           at org.apache.spark.sql.execution.datasources.ResolvedDataSource $ .lookupDataSource(ResolvedDataSource.scala:77)           at org.apache.spark.sql.execution.datasources.ResolvedDataSource $ .apply(ResolvedDataSource.scala:102)           在org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)           在org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:109)           在org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:244)           在org.apache.spark.deploy.SparkSubmit $ .doRunMain $ 1(SparkSubmit.scala:181)           在org.apache.spark.deploy.SparkSubmit $ .submit(SparkSubmit.scala:206)           在org.apache.spark.deploy.SparkSubmit $ .main(SparkSubmit.scala:121)           在org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)引起:java.lang.ClassNotFoundException:json.DefaultSource           在scala.tools.nsc.interpreter.AbstractFileClassLoader.findClass(AbstractFileClassLoader.scala:83)           at java.lang.ClassLoader.loadClass(ClassLoader.java:424)           at java.lang.ClassLoader.loadClass(ClassLoader.java:357)           在org.apache.spark.sql.execution.datasources.ResolvedDataSource $$ anonfun $ 4 $$ anonfun $ apply $ 1.apply(ResolvedDataSource.scala:62)           在org.apache.spark.sql.execution.datasources.ResolvedDataSource $$ anonfun $ 4 $$ anonfun $ apply $ 1.apply(ResolvedDataSource.scala:62)           在scala.util.Try $ .apply(Try.scala:161)           在org.apache.spark.sql.execution.datasources.ResolvedDataSource $$ anonfun $ 4.apply(ResolvedDataSource.scala:62)           在org.apache.spark.sql.execution.datasources.ResolvedDataSource $$ anonfun $ 4.apply(ResolvedDataSource.scala:62)           在scala.util.Try.orElse(Try.scala:82)           at org.apache.spark.sql.execution.datasources.ResolvedDataSource $ .lookupDataSource(ResolvedDataSource.scala:62)           ......还有50个

我试图寻找可能缺少的火花套装,但找不到任何有用的修复方法。

我使用Pyspark尝试了类似的代码,但它失败了类似的json数据源ClassNotFoundException。

在进一步尝试将现有RDD转换为JsonRDD后,我能够成功获得结果。有什么我想念的吗?我在Scala-2.10.5上使用Spark-1.6.1。任何帮助表示赞赏。感谢

val stringRDD = sc.parallelize(Seq(""" 
  { "isActive": false,
    "balance": "$1,431.73",
    "picture": "http://placehold.it/32x32",
    "age": 35,
    "eyeColor": "blue"
  }""",
   """{
    "isActive": true,
    "balance": "$2,515.60",
    "picture": "http://placehold.it/32x32",
    "age": 34,
    "eyeColor": "blue"
  }""", 
  """{
    "isActive": false,
    "balance": "$3,765.29",
    "picture": "http://placehold.it/32x32",
    "age": 26,
    "eyeColor": "blue"
  }""")
)
sqlContext.jsonRDD(stringRDD).registerTempTable("testjson")
sqlContext.sql("SELECT age from testjson").collect

1 个答案:

答案 0 :(得分:0)

我使用源代码创建了jar,因此我认为问题在于缺少一些资源。我从spark网站&中下载了最新的jar。它按预期工作。