我有一个像这样的数据集:
+---+-------------------------------+--------+
|key|value |someData|
+---+-------------------------------+--------+
|1 |AAA |5 |
|1 |VVV |6 |
|1 |DDDD |8 |
|3 |rrerw |9 |
|4 |RRRRR |13 |
|6 |AAAAABB |15 |
|6 |C:\Windows\System32\svchost.exe|20 |
+---+-------------------------------+--------+
现在,我两次应用汇总avg
函数,首先在有序窗口上,然后在无序窗口上,结果不是相同的示例:
WindowSpec windowSpec = Window.orderBy(col("someData")).partitionBy(col("key"));
rawMapping.withColumn("avg", avg("someData").over(windowSpec)).show(false);
+---+-------------------------------+--------+-----------------+
|key|value |someData|avg |
+---+-------------------------------+--------+-----------------+
|1 |AAA |5 |5.0 |
|1 |VVV |6 |5.5 |
|1 |DDDD |8 |6.333333333333333|
|6 |AAAAABB |15 |15.0 |
|6 |C:\Windows\System32\svchost.exe|20 |17.5 |
|3 |rrerw |9 |9.0 |
|4 |RRRRR |13 |13.0 |
+---+-------------------------------+--------+-----------------+
WindowSpec windowSpec2 = Window.partitionBy(col("key"));
rawMapping.withColumn("avg", avg("someData").over(windowSpec2)).show(false);
+---+-------------------------------+--------+-----------------+
|key|value |someData|avg |
+---+-------------------------------+--------+-----------------+
|1 |AAA |5 |6.333333333333333|
|1 |VVV |6 |6.333333333333333|
|1 |DDDD |8 |6.333333333333333|
|6 |AAAAABB |15 |17.5 |
|6 |C:\Windows\System32\svchost.exe|20 |17.5 |
|3 |rrerw |9 |9.0 |
|4 |RRRRR |13 |13.0 |
+---+-------------------------------+--------+-----------------+
在整理窗口时,聚合函数具有“滑动窗口”行为,为什么会发生这种情况?更重要的是,它是错误还是功能?