如何选择不在GROUP BY子句或聚合函数中的列?

时间:2017-03-09 16:46:59

标签: mysql apache-spark-sql

我收到了以下问题:

SELECT cola, count(cola) over() as num, colb, colc
FROM public."tabA"
GROUP BY cola

执行Spark SQL时:

  

错误:列" tabA.colb"必须出现在GROUP BY子句中或用于聚合函数。

如何选择列的计数以及其他列?

1 个答案:

答案 0 :(得分:1)

首先让我解释为什么当colb不属于groupBy或聚合函数时,您无法colb

想象一下,如果您在同一cola的不同colb的数据集中有两条记录,会发生什么? val inventory = Seq( ("a", "__1__", "c"), ("a", "__2__", "c")).toDF("cola", "colb", "colc") scala> inventory.show +----+-----+----+ |cola| colb|colc| +----+-----+----+ | a|__1__| c| | a|__2__| c| +----+-----+----+ 的价值是什么?

over

使用Windows

一种可能的解决方案是使用...窗口运算符。事实上,您非常接近将import org.apache.spark.sql.expressions.Window val byCola = Window.partitionBy("cola") scala> inventory.withColumn("count", count("*") over byCola).show +----+-----+----+-----+ |cola| colb|colc|count| +----+-----+----+-----+ | a|__1__| c| 2| | a|__2__| c| 2| +----+-----+----+-----+ 函数用于它们。

inventory.createOrReplaceTempView("inventory")

scala> sql("""
     |   SELECT cola, count(cola) over byCola as num, colb, colc
     |   FROM inventory
     |   WINDOW byCola AS (PARTITION BY cola)
     | """).show
+----+---+-----+----+
|cola|num| colb|colc|
+----+---+-----+----+
|   a|  2|__1__|   c|
|   a|  2|__2__|   c|
+----+---+-----+----+

在SQL中,它如下:

 public class MyViewModel : BindableBase
 {
     private Person _myPerson;
     public Person Person
     {
         get { return _myPerson; }
         set { SetProperty(ref _myPerson, value); }
     } 
 }