我收到了以下问题:
SELECT cola, count(cola) over() as num, colb, colc
FROM public."tabA"
GROUP BY cola
执行Spark SQL时:
错误:列" tabA.colb"必须出现在GROUP BY子句中或用于聚合函数。
如何选择列的计数以及其他列?
答案 0 :(得分:1)
首先让我解释为什么当colb
不属于groupBy
或聚合函数时,您无法colb
。
想象一下,如果您在同一cola
的不同colb
的数据集中有两条记录,会发生什么? val inventory = Seq(
("a", "__1__", "c"),
("a", "__2__", "c")).toDF("cola", "colb", "colc")
scala> inventory.show
+----+-----+----+
|cola| colb|colc|
+----+-----+----+
| a|__1__| c|
| a|__2__| c|
+----+-----+----+
的价值是什么?
over
一种可能的解决方案是使用...窗口运算符。事实上,您非常接近将import org.apache.spark.sql.expressions.Window
val byCola = Window.partitionBy("cola")
scala> inventory.withColumn("count", count("*") over byCola).show
+----+-----+----+-----+
|cola| colb|colc|count|
+----+-----+----+-----+
| a|__1__| c| 2|
| a|__2__| c| 2|
+----+-----+----+-----+
函数用于它们。
inventory.createOrReplaceTempView("inventory")
scala> sql("""
| SELECT cola, count(cola) over byCola as num, colb, colc
| FROM inventory
| WINDOW byCola AS (PARTITION BY cola)
| """).show
+----+---+-----+----+
|cola|num| colb|colc|
+----+---+-----+----+
| a| 2|__1__| c|
| a| 2|__2__| c|
+----+---+-----+----+
在SQL中,它如下:
public class MyViewModel : BindableBase
{
private Person _myPerson;
public Person Person
{
get { return _myPerson; }
set { SetProperty(ref _myPerson, value); }
}
}