Question

我想创建一个窗口函数，该函数将计算当前行中字段的值出现在当前行之前的有序分区部分中的次数。为了使这更具体，假设我们有一个这样的表：

| id| fruit | date | +---+--------+------+ | 1 | apple | 1 | | 1 | cherry | 2 | | 1 | apple | 3 | | 1 | cherry | 4 | | 2 | orange | 1 | | 2 | grape | 2 | | 2 | grape | 3 |

我们想要创建一个这样的表（为了清楚起见，省略了日期列）：

| id| fruit | prior | +---+--------+-------+ | 1 | apple | 0 | | 1 | cherry | 0 | | 1 | apple | 1 | | 1 | cherry | 1 | | 2 | orange | 0 | | 2 | grape | 0 | | 2 | grape | 1 |

请注意，对于id = 1，沿着有序分区移动，第一个条目'apple'与任何内容都不匹配（因为隐含的集合为空），下一个水果'cherry'也不匹配。然后我们再次进入'apple'，这是匹配等等。我想象SQL看起来像这样：

SELECT id, fruit, <some kind of INTERSECT?> OVER (PARTITION BY id ORDER by date) AS prior FROM fruit_table;

但我找不到任何看起来正确的东西。 FWIW，我正在使用PostgreSQL 8.4。

Answer 1

你可以通过自我左连接和count()来解决没有窗口功能的问题：

SELECT t.id, t.fruit, t.day, count(t0.*) AS prior
FROM   tbl t
LEFT   JOIN tbl t0 ON (t0.id, t0.fruit) = (t.id, t.fruit) AND t0.day < t.day
GROUP  BY t.id, t.day, t.fruit
ORDER  BY t.id, t.day

我重命名了日期列day，因为date是reserved word in every SQL standard and in PostgreSQL。
我更正了您的示例数据中的错误。他们有你的方式，它没有检查出来。可能会让人感到困惑。

如果您的观点是使用窗口功能，那么这个应该有效：

SELECT id, fruit, day
      ,count(*) OVER (PARTITION BY id, fruit ORDER BY day) - 1 AS prior
FROM   tbl
ORDER  BY id, day

这是有效的，因为，我quote the manual：

如果省略frame_end，则默认为CURRENT ROW。

您有效地计算前几天有多少行具有相同的(id, fruit) - 包括当前行。这就是- 1的用途。

使用SQL中的窗口函数运行“匹配”总计

1 个答案: