我正在对一些数据进行分组,然后尝试确定每一行的百分比。所以我写了这个:
select Vendor, OS_Version, count(distinct device_uid), (select count(distinct device_uid) from device_information_latest dil2 where dil1.vendor = dil2.vendor limit 1) from device_information_latest dil1 where vendor in ('Canonical') GROUP BY Vendor, OS_Version order by vendor, OS_Version;
这给了我:
+-----------+------------+----------------------------+-----------------------------------------------------------------------------------------------------------------+
| Vendor | OS_Version | count(distinct device_uid) | (select count(distinct device_uid) from device_information_latest dil2 where dil1.vendor = dil2.vendor limit 1) |
+-----------+------------+----------------------------+-----------------------------------------------------------------------------------------------------------------+
| Canonical | 14.04 | 4 | 23 |
| Canonical | 16.04 | 19 | 23 |
+-----------+------------+----------------------------+-----------------------------------------------------------------------------------------------------------------+
看起来不错。现在,我尝试将第三列除以第四列(注意,我只是用除法斜杠替换逗号)。
select Vendor, OS_Version, count(distinct device_uid) / (select count(distinct device_uid) from device_information_latest dil2 where dil1.vendor = dil2.vendor limit 1) from device_information_latest dil1 where vendor in ('Canonical') GROUP BY Vendor, OS_Version order by vendor, OS_Version;
+-----------+------------+----------------------------------------------------------------------------------------------------------------------------------------------+
| Vendor | OS_Version | count(distinct device_uid) / (select count(distinct device_uid) from device_information_latest dil2 where dil1.vendor = dil2.vendor limit 1) |
+-----------+------------+----------------------------------------------------------------------------------------------------------------------------------------------+
| Canonical | 14.04 | 0.0315 |
| Canonical | 16.04 | 0.1496 |
+-----------+------------+----------------------------------------------------------------------------------------------------------------------------------------------+
如果查看第三列的值,您希望它们总和为1.0(100%)。但他们没有。
我错过了什么?
答案 0 :(得分:0)
您的SQL查询看起来不错,但我不知道device_information_latest
的确切结构。这个device_uid
字段究竟是什么?
我建议看看那些错误的价值观。假设count(distinct device_uid)
总是返回正确的值4和19.然后我们可以制作两个方程式:
4 / x1 = 0.0315
19 / x2 = 0.1496
经过一些计算后发现:
x1 = 4 / 0.0315 = 126,984126984127
x2 = 19 / 0.1496 = 127,0053475935829
所以在第一个查询中(select count(distinct device_uid) from device_information_latest dil2 where dil1.vendor = dil2.vendor limit 1)
被评估为23但在第二个查询中它被评估为127.显然,重要的是你是否计算两个不同列中的两个简单表达式而不是一列中的一个复杂表达式。我怀疑where dil1.vendor = dil2.vendor
以某种方式过滤不同的行,但这只是猜测。我不太熟悉SQL操作的顺序。它的逻辑顺序对于两个查询应该是相同的,但可能会发生一些特定的MySQL优化。你能为两个查询提供EXPLAIN
的输出吗?
最终,如果这是一些错误,你可能会尝试这个:
select
Vendor,
OS_Version,
quantity/total as 'percentage'
from
(
select
Vendor,
OS_Version,
count(distinct device_uid) as quantity,
(select count(distinct device_uid) from device_information_latest dil2 where dil1.vendor = dil2.vendor limit 1) as total
from
device_information_latest dil1
where
vendor in ('Canonical')
GROUP BY
Vendor, OS_Version
order by vendor, OS_Version
) as tmp