我有一张下表。这些行按特定顺序排列。
id | value
------+---------------------
1 | 2
1 | 4
1 | 3
2 | 2
2 | 2
2 | 5
我想按“ id”列对行进行分组,并根据该列的先前值获得每列中显示的平均值(如括号中的以下示例所述)
id | value | RelativeAverage
------+-------------+--------------------
1 | 2 | (2/1) = 2
1 | 4 | (2+4 /2) = 3
1 | 3 | (2+4+3 / 3) = 3
2 | 2 | (2/1) = 2
2 | 2 | (2+2 / 2) = 2
2 | 5 | (2+2+5 / 3) = 9
有没有一种方法可以实现这一目标?
预先感谢
答案 0 :(得分:2)
错误的查询:
select
id, value,
sum(value) over(arrangement), rank() over(arrangement),
sum(value) over(arrangement)::numeric / rank() over(arrangement)
as relative_average
from tbl
window arrangement as (partition by id order by id);
输出(错误):
| id | value | sum | rank | relative_average |
|----|-------|-----|------|------------------|
| 1 | 2 | 9 | 1 | 9 |
| 1 | 4 | 9 | 1 | 9 |
| 1 | 3 | 9 | 1 | 9 |
| 2 | 1 | 8 | 1 | 8 |
| 2 | 2 | 8 | 1 | 8 |
| 2 | 5 | 8 | 1 | 8 |
您需要正确排序的内容,以使总和和排名在您的数据实际排列上正常工作。您可以使用表格行的隐藏ctid
字段,但这是Postgres特定的
正确的查询:
select
id, value,
sum(value) over(arrangement), rank() over(arrangement),
sum(value) over(arrangement)::numeric / rank() over(arrangement)
as relative_average
from tbl
window arrangement as (partition by id order by tbl.ctid);
输出(正确):
| id | value | sum | rank | relative_average |
|----|-------|-----|------|--------------------|
| 1 | 2 | 2 | 1 | 2 |
| 1 | 4 | 6 | 2 | 3 |
| 1 | 3 | 9 | 3 | 3 |
| 2 | 1 | 1 | 1 | 1 |
| 2 | 2 | 3 | 2 | 1.5 |
| 2 | 5 | 8 | 3 | 2.6666666666666665 |
最好的方法是引入一个串行主键,这样就可以根据数据的实际排列来进行总计(sum over()
)。
CREATE TABLE tbl
(ordered_pk serial primary key, "id" int, "value" int)
;
INSERT INTO tbl
("id", "value")
VALUES
(1, 2),
(1, 4),
(1, 3),
(2, 1),
(2, 2),
(2, 5)
;
正确的查询:
select
id, value,
sum(value) over(arrangement), rank() over(arrangement),
sum(value) over(arrangement)::numeric / rank() over(arrangement)
as relative_average
from tbl
window arrangement as (partition by id order by ordered_pk);
输出(正确):
| id | value | sum | rank | relative_average |
|----|-------|-----|------|--------------------|
| 1 | 2 | 2 | 1 | 2 |
| 1 | 4 | 6 | 2 | 3 |
| 1 | 3 | 9 | 3 | 3 |
| 2 | 1 | 1 | 1 | 1 |
| 2 | 2 | 3 | 2 | 1.5 |
| 2 | 5 | 8 | 3 | 2.6666666666666665 |
实时测试:http://sqlfiddle.com/#!17/f18276/1
您可以order by value
,但是它将产生不同的结果,不一定是错误的输出,而是由于值的排列方式而不同。然后,由于可能重复的值,您还需要使用row_number
而不是rank
/ dense_rank
。在这里,我举了一个重复值的例子。
正确的查询:
select
id, value,
sum(value) over(arrangement),
row_number() over(arrangement),
rank() over(arrangement),
dense_rank() over(arrangement),
sum(value) over(arrangement)::numeric / row_number() over(arrangement)
as relative_average
from tbl
window arrangement as (partition by id order by value)
输出:
| id | value | sum | row_number | rank | dense_rank | relative_average |
|----|-------|-----|------------|------|------------|--------------------|
| 1 | 2 | 2 | 1 | 1 | 1 | 2 |
| 1 | 3 | 5 | 2 | 2 | 2 | 2.5 |
| 1 | 4 | 9 | 3 | 3 | 3 | 3 |
| 2 | 1 | 1 | 1 | 1 | 1 | 1 |
| 2 | 2 | 5 | 2 | 2 | 2 | 2.5 |
| 2 | 2 | 5 | 3 | 2 | 2 | 1.6666666666666667 |
| 2 | 5 | 10 | 4 | 4 | 3 | 2.5 |
答案 1 :(得分:1)
不太为我的其他answer
感到骄傲只需使用avg
。
今天我学会了rows between unbounded preceding and current row
。即使没有order by
的良好候选字段,它也可以与数据的实际排列方式一起工作。看起来至少您可以摆脱使用Postgres的隐藏ctid
字段的困扰,或者甚至可以避免使用串行主数据库。建议还是在order by
之后使用串行主键或创建日期字段。
这是一个更好的查询。无需划分,只需使用avg
select
id, value,
avg(value) over(arrangement rows between unbounded preceding and current row)
from tbl
window arrangement as (partition by id);
输出
| id | value | avg |
|----|-------|--------------------|
| 1 | 2 | 2 |
| 1 | 4 | 3 |
| 1 | 3 | 3 |
| 2 | 1 | 1 |
| 2 | 2 | 1.5 |
| 2 | 5 | 2.6666666666666665 |
select
id, value,
sum(value) over(arrangement), rank() over(arrangement),
sum(value) over(arrangement)::numeric / rank() over(arrangement)
as relative_average,
avg(value) over(arrangement rows between unbounded preceding and current row)
from tbl
window arrangement as (partition by id order by id);
输出:
| id | value | sum | rank | relative_average | avg |
|----|-------|-----|------|------------------|--------------------|
| 1 | 2 | 9 | 1 | 9 | 2 |
| 1 | 4 | 9 | 1 | 9 | 3 |
| 1 | 3 | 9 | 1 | 9 | 3 |
| 2 | 1 | 8 | 1 | 8 | 1 |
| 2 | 2 | 8 | 1 | 8 | 1.5 |
| 2 | 5 | 8 | 1 | 8 | 2.6666666666666665 |
select
id, value,
sum(value) over(arrangement), rank() over(arrangement),
sum(value) over(arrangement)::numeric / rank() over(arrangement)
as relative_average,
avg(value) over(arrangement rows between unbounded preceding and current row)
from tbl
window arrangement as (partition by id order by tbl.ctid);
输出:
| id | value | sum | rank | relative_average | avg |
|----|-------|-----|------|--------------------|--------------------|
| 1 | 2 | 2 | 1 | 2 | 2 |
| 1 | 4 | 6 | 2 | 3 | 3 |
| 1 | 3 | 9 | 3 | 3 | 3 |
| 2 | 1 | 1 | 1 | 1 | 1 |
| 2 | 2 | 3 | 2 | 1.5 | 1.5 |
| 2 | 5 | 8 | 3 | 2.6666666666666665 | 2.6666666666666665 |
select
id, value,
sum(value) over(arrangement), rank() over(arrangement),
sum(value) over(arrangement)::numeric / rank() over(arrangement)
as relative_average,
avg(value) over(arrangement rows between unbounded preceding and current row)
from tbl
window arrangement as (partition by id order by ordered_pk);
输出:
| id | value | sum | rank | relative_average | avg |
|----|-------|-----|------|--------------------|--------------------|
| 1 | 2 | 2 | 1 | 2 | 2 |
| 1 | 4 | 6 | 2 | 3 | 3 |
| 1 | 3 | 9 | 3 | 3 | 3 |
| 2 | 1 | 1 | 1 | 1 | 1 |
| 2 | 2 | 3 | 2 | 1.5 | 1.5 |
| 2 | 5 | 8 | 3 | 2.6666666666666665 | 2.6666666666666665 |
实时测试:http://sqlfiddle.com/#!17/f18276/9
rows between unbounded preceding and current row
也可以写成rows unbounded preceding
http://sqlfiddle.com/#!17/f18276/11
当值重复时,这里是order by value
的结果。
select
id, value,
sum(value) over(arrangement),
row_number() over(arrangement) as rn,
rank() over(arrangement) as rank,
dense_rank() over(arrangement) drank,
trunc( sum(value) over(arrangement)::numeric
/ row_number() over(arrangement), 2) as ra__rn,
trunc( sum(value) over(arrangement)::numeric
/ row_number() over(arrangement), 2) as ra__rank,
trunc( sum(value) over(arrangement)::numeric
/ row_number() over(arrangement), 2) as ra__drank,
trunc( avg(value) over(arrangement
rows between unbounded preceding and current row), 2) as ra
from tbl
window arrangement as (partition by id order by value)
输出:
| id | value | sum | rn | rank | drank | ra__rn | ra__rank | ra__drank | ra |
|----|-------|-----|----|------|-------|--------|----------|-----------|------|
| 1 | 2 | 2 | 1 | 1 | 1 | 2 | 2 | 2 | 2 |
| 1 | 3 | 5 | 2 | 2 | 2 | 2.5 | 2.5 | 2.5 | 2.5 |
| 1 | 4 | 9 | 3 | 3 | 3 | 3 | 3 | 3 | 3 |
| 2 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| 2 | 2 | 5 | 2 | 2 | 2 | 2.5 | 2.5 | 2.5 | 1.5 |
| 2 | 2 | 5 | 3 | 2 | 2 | 1.66 | 1.66 | 1.66 | 1.66 |
| 2 | 5 | 10 | 4 | 4 | 3 | 2.5 | 2.5 | 2.5 | 2.5 |
实时测试:http://sqlfiddle.com/#!17/2b5aac/16
当值重复时,这里是order by ordered_pk
的结果。
select
id, value,
sum(value) over(arrangement),
row_number() over(arrangement) as rn,
rank() over(arrangement) as rank,
dense_rank() over(arrangement) drank,
trunc( sum(value) over(arrangement)::numeric
/ row_number() over(arrangement), 2) as ra__rn,
trunc( sum(value) over(arrangement)::numeric
/ row_number() over(arrangement), 2) as ra__rank,
trunc( sum(value) over(arrangement)::numeric
/ row_number() over(arrangement), 2) as ra__drank,
trunc( avg(value) over(arrangement
rows between unbounded preceding and current row), 2) as ra
from tbl
window arrangement as (partition by id order by ordered_pk)
| id | value | sum | rn | rank | drank | ra__rn | ra__rank | ra__drank | ra |
|----|-------|-----|----|------|-------|--------|----------|-----------|------|
| 1 | 2 | 2 | 1 | 1 | 1 | 2 | 2 | 2 | 2 |
| 1 | 4 | 6 | 2 | 2 | 2 | 3 | 3 | 3 | 3 |
| 1 | 3 | 9 | 3 | 3 | 3 | 3 | 3 | 3 | 3 |
| 2 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| 2 | 2 | 3 | 2 | 2 | 2 | 1.5 | 1.5 | 1.5 | 1.5 |
| 2 | 2 | 5 | 3 | 3 | 3 | 1.66 | 1.66 | 1.66 | 1.66 |
| 2 | 5 | 10 | 4 | 4 | 4 | 2.5 | 2.5 | 2.5 | 2.5 |
答案 2 :(得分:0)
如果我假设您在表中有一个排序列,那么您想要的是:
select t.*,
avg(value) over (partition by id
order by ?
rows between unbounded preceding and current row
) as running_avg
from t;
?
是排序列。
换句话说,Postgres有一个内置函数完全可以实现您想要的功能-并且该函数恰好是标准SQL。
使用rows
的窗口框架是必需的,因为默认值为range
。
如果没有订购列,则应添加一个。我强烈建议您不要为此目的使用ctid
。似乎它可以处理少量数据集,但随着时间的推移它不稳定,并且可能不适用于较大的数据集。
如果您希望数据按插入顺序排序,请使用serial
列捕获插入顺序。