订购时,Spark Window函数具有滑动窗口行为

时间:2019-02-28 15:03:49

标签: apache-spark

我有一个像这样的数据集:

+---+-------------------------------+--------+
|key|value                          |someData|
+---+-------------------------------+--------+
|1  |AAA                            |5       |
|1  |VVV                            |6       |
|1  |DDDD                           |8       |
|3  |rrerw                          |9       |
|4  |RRRRR                          |13      |
|6  |AAAAABB                        |15      |
|6  |C:\Windows\System32\svchost.exe|20      |
+---+-------------------------------+--------+

现在,我两次应用汇总avg函数,首先在有序窗口上,然后在无序窗口上,结果不是相同的示例:

WindowSpec windowSpec = Window.orderBy(col("someData")).partitionBy(col("key"));
rawMapping.withColumn("avg", avg("someData").over(windowSpec)).show(false);

+---+-------------------------------+--------+-----------------+
|key|value                          |someData|avg              |
+---+-------------------------------+--------+-----------------+
|1  |AAA                            |5       |5.0              |
|1  |VVV                            |6       |5.5              |
|1  |DDDD                           |8       |6.333333333333333|
|6  |AAAAABB                        |15      |15.0             |
|6  |C:\Windows\System32\svchost.exe|20      |17.5             |
|3  |rrerw                          |9       |9.0              |
|4  |RRRRR                          |13      |13.0             |
+---+-------------------------------+--------+-----------------+

WindowSpec windowSpec2 = Window.partitionBy(col("key"));
rawMapping.withColumn("avg", avg("someData").over(windowSpec2)).show(false);

+---+-------------------------------+--------+-----------------+
|key|value                          |someData|avg              |
+---+-------------------------------+--------+-----------------+
|1  |AAA                            |5       |6.333333333333333|
|1  |VVV                            |6       |6.333333333333333|
|1  |DDDD                           |8       |6.333333333333333|
|6  |AAAAABB                        |15      |17.5             |
|6  |C:\Windows\System32\svchost.exe|20      |17.5             |
|3  |rrerw                          |9       |9.0              |
|4  |RRRRR                          |13      |13.0             |
+---+-------------------------------+--------+-----------------+

在整理窗口时,聚合函数具有“滑动窗口”行为,为什么会发生这种情况?更重要的是,它是错误还是功能?

0 个答案:

没有答案