JSON中的重复键,如何只考虑一个键

时间:2017-10-17 01:41:29

标签: json apache-spark hive pyspark

从hive加载文件时,我遇到异常:

  

pyspark.sql.utils.AnalysisException:u'重复列:" department_name"发现,无法保存为JSON格式;

代码是:

conf = SparkConf().setAppName('pyspark')
sc=SparkContext(conf=conf)
sqlContext = SQLContext(sc)
result = sqlContext.read.json("path/department_dup_key.json")
result.registerTempTable("djson")
result_set=sqlContext.sql("select * from djson").collect()

" department_dup_key.json"的文件内容是:

{"department_id":7,"department_name":"golf"}
{"department_id":8,"department_name":"apparel"}
{"department_id":9,"department_name":"fitness"}
{"department_id":10,"department_name":"testing","department_name":"Hellloooo"}

我可以忽略第二个" department_name"在阅读数据框时?

0 个答案:

没有答案