Group by and order by in Spark SQL

时间:2016-10-20 12:34:10

标签: apache-spark apache-spark-sql

I am trying to access s3 data using a spark Application. I am applying Spark SQL to retrieve the data. It is not taking group by clause.

DataFrame summaryQuery=sql.sql("Select score from summary order by updationDate desc);
summaryQuery.groupBy("sessionId").count().show();
summaryQuery.show();

Also i am trying it directly

    DataFrame summaryQuery=sql.sql("Select score from summary group by sessionId order by updationDate desc);
summaryquery.show();

But in both the cases i am getting SQL exception.

Exception in thread "main" org.apache.spark.sql.AnalysisException: expression 'score' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() (or first_value) if you don't care which value you get.;

Please specify how can i query the data.

1 个答案:

答案 0 :(得分:3)

在Spark SQL中,当group by子句中不存在column_name时,我们必须将它包装在first(column_name)或last(column_name)或任何聚合函数的函数中。它将分别从获取的行中获取第一个或最后一个值。

DataFrame summaryQuery=sql.sql("Select first(score) from summary group by sessionId order by updationDate desc);
summaryquery.show();