从hive加载文件时,我遇到异常:
pyspark.sql.utils.AnalysisException:u'重复列:" department_name"发现,无法保存为JSON格式;
代码是:
conf = SparkConf().setAppName('pyspark')
sc=SparkContext(conf=conf)
sqlContext = SQLContext(sc)
result = sqlContext.read.json("path/department_dup_key.json")
result.registerTempTable("djson")
result_set=sqlContext.sql("select * from djson").collect()
" department_dup_key.json"的文件内容是:
{"department_id":7,"department_name":"golf"}
{"department_id":8,"department_name":"apparel"}
{"department_id":9,"department_name":"fitness"}
{"department_id":10,"department_name":"testing","department_name":"Hellloooo"}
我可以忽略第二个" department_name"在阅读数据框时?