如何解决这个spark-scala sql错误信息

时间:2017-08-28 04:44:47

标签: mongodb scala apache-spark apache-spark-sql connector

要删除重复的行,我尝试使用此sql

val characters = MongoSpark.load[sparkSQL.Character](sparkSession)
characters.createOrReplaceTempView("characters")
val testsql = sparkSession.select("SELECT * FROM characters GROUP BY title")
testsql.show()

但是这个sql会出现此错误消息。 如果你知道这个问题,请回答这个问题。

谢谢你

Parsing command: SELECT * FROM characters GROUP BY title
Exception in thread "main" org.spache.spark.sql.AnalysisException: 
expression 'characters.`url`' is neither present in the group by, nor is it an aggregate function
Add to Add to group by  or wrap in first() if you don't care which value you get.;;

然后我尝试这样但我不知道这是正确的解决方案......

请回答这个问题。谢谢你!

val characters = MongoSpark.load[sparkSQL.Character](sparkSession)
characters.createOrReplaceTempView("characters")
val testsql = sparkSession.select("SELECT * FROM characters")
testgrsql = testsql.groupBy("title")
testgrsql.show()

1 个答案:

答案 0 :(得分:1)

错误消息解释了所有内容,

  

解析命令:SELECT * FROM characters GROUP BY title

     

线程“main”中的异常org.spache.spark.sql.AnalysisException:   表达式'characters.url'既不存在于组中,也不是聚合函数

     

如果您不关心您获得哪个值,请添加到“添加到分组”或“包含”first();;;

所以用法可以是,如果你想要每个标题的第一个网址值,那么first(url)

characters.createOrReplaceTempView("characters")
val testsql = sparkSession.sql("SELECT title, first(url) FROM characters GROUP BY title")