我有3张桌子:
CREATE TABLE IF NOT EXISTS `disksinfo` (
`idx` int(10) NOT NULL AUTO_INCREMENT,
`hostinfo_idx` int(10) DEFAULT NULL,
`id` char(30) DEFAULT NULL,
`name` char(30) DEFAULT NULL,
`size` bigint(20) DEFAULT NULL,
`freespace` bigint(20) DEFAULT NULL,
PRIMARY KEY (`idx`)
)
CREATE TABLE IF NOT EXISTS `hostinfo` (
`idx` int(10) NOT NULL AUTO_INCREMENT,
`host_idx` int(11) DEFAULT NULL,
`probetime` datetime DEFAULT NULL,
`processor_load` tinyint(4) DEFAULT NULL,
`memory_total` bigint(20) DEFAULT NULL,
`memory_free` bigint(20) DEFAULT NULL,
PRIMARY KEY (`idx`)
)
CREATE TABLE IF NOT EXISTS `hosts` (
`idx` int(10) NOT NULL AUTO_INCREMENT,
`name` char(30) DEFAULT '0',
PRIMARY KEY (`idx`)
)
Basicaly,hosts只是主机名表中使用的主机名的固定列表(hostinfo.host_idx = hosts.idx) hostinfo是一个表,每隔几分钟就会填充来自所有主机的数据,此外,对于每个hostinfo行,至少创建一个diskinfo行。每个diskinfo行包含至少一个磁盘的信息(因此,对于某些主机,有3-4行diskinfo)。 diskinfo.hostinfo_idx = hostinfo.idx。 hostinfo.probetime就是创建数据快照的时间。
我现在要执行的是为每个特定的不同主机(hostinfo.host_idx)选择最后一个hostinfo(.probetime),同时加入有关磁盘(diskinfo表)和主机名(主机表)的信息
我来了这个:
SELECT hinfo.idx,
hinfo.host_idx,
hinfo.processor_load,
hinfo.memory_total,
hinfo.memory_free,
hnames.idx,
hnames.name,
disks.hostinfo_idx,
disks.id,
disks.name,
disks.size,
disks.freespace,
Max(hinfo.probetime)
FROM systeminfo.hostinfo AS hinfo
INNER JOIN systeminfo.hosts AS hnames
ON hnames.idx = hinfo.host_idx
INNER JOIN systeminfo.disksinfo AS disks
ON disks.hostinfo_idx = hinfo.idx
GROUP BY disks.id,
hnames.name
ORDER BY hnames.name,
disks.id
似乎工作!但是,它是100%正确吗?它是最佳的吗?谢谢你的任何提示!
答案 0 :(得分:3)
这不是100%正确,不是。
假设您有此表:
x | y | z
-----------------
a b 1
a c 2
d e 1
d f 2
现在当你只按x分组时,行会折叠,MySQL会从折叠的行中选择一个随机行。所以你可能会得到
x | y | z
-----------------
a b 2
d e 2
或者
x | y | z
-----------------
a c 2
d f 2
或另一种组合,这是未确定的。每次触发查询时,您可能会得到不同的结果。由于2
函数,z
列中的MAX()
始终存在,但您不一定会获得相应的行。
其他RDBMS实际上会做同样的事情,但大多数情况下默认禁止这种情况(在MySQL中也可以禁止)。你有两种可能来解决这个问题(实际上有更多,但我会限制为两个)。
要么将SELECT
子句中的所有列都放在SUM()
子句中的MAX()
或GROUP BY
之类的聚合函数中,像这样:
SELECT hinfo.idx,
hinfo.host_idx,
hinfo.processor_load,
hinfo.memory_total,
hinfo.memory_free,
hnames.idx,
hnames.name,
disks.hostinfo_idx,
disks.id,
disks.name,
disks.size,
disks.freespace,
Max(hinfo.probetime)
FROM systeminfo.hostinfo AS hinfo
INNER JOIN systeminfo.hosts AS hnames
ON hnames.idx = hinfo.host_idx
INNER JOIN systeminfo.disksinfo AS disks
ON disks.hostinfo_idx = hinfo.idx
GROUP BY
hinfo.idx,
hinfo.host_idx,
hinfo.processor_load,
hinfo.memory_total,
hinfo.memory_free,
hnames.idx,
hnames.name,
disks.hostinfo_idx,
disks.id,
disks.name,
disks.size,
disks.freespace
ORDER BY hnames.name,
disks.id
请注意,此查询可能会为您带来不同的结果!我只关注问题,您可能会将错误的数据输入到您认为拥有MAX(hinfo.probetime)
的行中。< / p>
或者你这样解决(这会得到你想要的):
SELECT hinfo.idx,
hinfo.host_idx,
hinfo.processor_load,
hinfo.memory_total,
hinfo.memory_free,
hnames.idx,
hnames.name,
disks.hostinfo_idx,
disks.id,
disks.name,
disks.size,
disks.freespace,
hinfo.probetime
FROM systeminfo.hostinfo AS hinfo
INNER JOIN systeminfo.hosts AS hnames
ON hnames.idx = hinfo.host_idx
INNER JOIN systeminfo.disksinfo AS disks
ON disks.hostinfo_idx = hinfo.idx
WHERE hinfo.probetime = (SELECT MAX(probetime) FROM systeminfo.hostinfo AS hi
INNER JOIN systeminfo.hosts AS hn
ON hnames.idx = hinfo.host_idx
INNER JOIN systeminfo.disksinfo AS d
ON disks.hostinfo_idx = hinfo.idx
WHERE d.id = disks.id AND hn.name = hnames.name)
GROUP BY disks.id,
hnames.name
ORDER BY hnames.name,
disks.id
手册中还有一个很好的例子:The Rows Holding the Group-wise Maximum of a Certain Column