当我尝试计算每组的记录数时,我看到,那个带有空值的组没有记录,但这不正确。
输入数据框:
+--------+
| Name|
+--------+
| Andrei|
| Andrei|
| null|
| null|
|Grigorii|
+--------+
代码:
Dataset<Row> df = inputDf.groupBy("Name")
.agg(functions.count("Name").as("Name_count"));
实际DataFrame:
+--------+----------+
| Name|Name_count|
+--------+----------+
| null| 0|
| Andrei| 2|
|Grigorii| 1|
+--------+----------+
预期的DataFrame:
+--------+----------+
| Name|Name_count|
+--------+----------+
| null| 2|
| Andrei| 2|
|Grigorii| 1|
+--------+----------+
答案 0 :(得分:0)
这有效:
Dataset<Row> storageFrame = leftDataset.groupBy("Name")
.agg(functions.count("*").as("Name_count"));