结果集行的平均值

时间:2017-02-24 12:39:35

标签: sql postgresql group-by rdbms

我有一张带宽利用率数据表。每行都有主机名,niccardname,利用率百分比和时间戳。在每个主机的最大时间戳处,可以有不同的NIC卡。 因此,对于每个主机,我希望在最大时间戳下平均%利用不同的NIC卡。

下面是我的表格结构,插入和查询 -

CREATE TABLE bandwith_utilization
(
  id integer NOT NULL,
  hostname character varying(255),
  "timestamp" bigint,
  niccardname character varying(255),
  percentageutilization integer,
  CONSTRAINT bandwidth_utilization_pkey PRIMARY KEY (id)
)
WITH (
  OIDS=FALSE
);
ALTER TABLE bandwith_utilization
  OWNER TO postgres;


INSERT INTO bandwith_utilization
VALUES (1,'host1','111111','nic1',40);
INSERT INTO bandwith_utilization
VALUES (2,'host1','111112','nic1',50);
INSERT INTO bandwith_utilization
VALUES (3,'host1','111113','nic1',50);
INSERT INTO bandwith_utilization
VALUES (4,'host1','111113','nic2',70);

INSERT INTO bandwith_utilization
VALUES (5,'host2','111111','nic1',80);
INSERT INTO bandwith_utilization
VALUES (6,'host2','111112','nic1',20);
INSERT INTO bandwith_utilization
VALUES (7,'host2','111112','nic2',30);

INSERT INTO bandwith_utilization
VALUES (8,'host3','111115','nic1',10);

所以插入后这是我的表 -

id  hostname    timestamp   niccardname     percentageutilization
------------------------------------------------------------------
1;  "host1";    111111;     "nic1";         40
2;  "host1";    111112;     "nic1";         50
3;  "host1";    111113;     "nic1";         50
4;  "host1";    111113;     "nic2";         70

5;  "host2";    111111;     "nic1";         80
6;  "host2";    111112;     "nic1";         20
7;  "host2";    111112;     "nic2";         30

8;  "host3";    111115;     "nic1";         10

我有一个查询,它以最大时间戳 -

为主机名输出
select hostname, timestamp, niccardname, percentageutilization
from report.bandwith_utilization
 where timestamp = (select max(timestamp)
                    from report.bandwith_utilization nwUtil
                    where nwUtil.hostname = report.bandwith_utilization.hostname
                   ) ;  

上述查询的输出是 -

"host1";  111113; "nic1"; 50
"host1";  111113; "nic2"; 70

"host2";  111112; "nic1"; 20
"host2";  111112; "nic2"; 30

"host3";  111115; "nic1"; 10

所以现在我的预期输出是每个主机的不同NIC卡的平均利用率%。即

"host1";  111113; "nic1"; 60
"host2";  111112; "nic1"; 25
"host3";  111115; "nic1"; 10

如何在上面提到的同一查询中找到最终平均输出?

2 个答案:

答案 0 :(得分:1)

应该是AVG()和分组

select hostname,timestamp,min(niccardname), avg(percentageutilization )
from report.bandwith_utilization
where (timestamp,hostname, niccardname)   in (select max(timestamp) ,hostname, niccardname
from report.bandwith_utilization nwUtil 
where nwUtil.hostname= report.bandwith_utilization.hostname
group by  hostname, niccardname
) 
group by  hostname,timestamp
order by  hostname,timestamp

答案 1 :(得分:1)

以下是以最大时间戳记获取行的更好方法:

select u.*
from (select u.*,
             rank() over (partition by hostname order by timestamp desc) as seqnum
      from report.bandwith_utilization u
     ) u
where seqnum = 1;

现在,你可以得到你想要的东西:

select u.hostname, u.timestamp, avg(percentageutilization)
from (select u.*,
             rank() over (partition by hostname order by timestamp desc) as seqnum
      from report.bandwith_utilization u
     ) u
where seqnum = 1
group by u.hostname, u.timestamp;

在结果集中包含niccardname对我来说没有意义。如果您想要一个值,可以使用min(niccardname);如果您希望所有值都在array_agg(niccardname),则可以使用public class Employee { public int Key { get; set; } public string FirstName { get; set; } public string SecondName { get; set; } public string FullName => FirstName + " " + SecondName; public Employee(int key, string first = null, string second = null) { Key = key; FirstName = first; SecondName = second; } }