我正在运行两个不同的SQL查询并得到截然不同的结果:
mips :此表在time
上编入索引,并为我衡量的每个“指标”包含*_good
和*_bad
字段(往返时间) ,重新传输的字节等)。这些字段包括:time
,rtt_good
,rtt_bad
,rexb_good
,rexb_bad
,nae_good
,nae_bad
等。
指标:此表已在time
,asn
(我们向其投放流量的网络)cty
(我们投放的国家/地区)上建立了索引该流量)和source
(我们从中传输该流量的数据中心)。因此,对于单个“时间”,我们有数十万行。每行都会告诉我们所投放的请求总数(reqs
)以及有关流量投放的各种衡量指标(rtt
,rexb
,nae
等)
这两个表在time
列上连接,该列包含UNIX时间戳。所有其他值都是浮点数。
鉴于rtt_good
(我们认为往返时间的值是“好”,如10ms),rtt_bad
(我们认为往返时间的值是“差” “,比如5秒),rtt
我们可以执行线性插值来衡量RTT的”有多好“或”多坏“:
rtt_mips = (rtt - rtt_good) / (rtt_bad - rtt_good)
由于我们每个可能的asn
,cty
和source
都有数据,因此我们经常需要汇总这些数据,以回答更常见的问题,例如“我们的RTT在墨西哥看起来如何? ?”。在汇总时,我们执行度量的加权平均值 - 按我们服务的请求数加权。例如,墨西哥的平均RTT将是:
select sum(rtt * reqs) / sum(reqs) as avg_rtt from metrics where cty = "mx"
现在问题是我们并不总是每隔5分钟为每个国家/地区的每个ASN提供服务。我们的日本数据中心可能有一段时间没有向墨西哥提供任何数据。这意味着,当我们按时间对这些指标进行分组时,我们会有很多NULL
行:
+------+---------+
| time | avg_rtt |
+------+---------+
| 1 | 300 |
| 2 | NULL |
| 3 | 400 |
| ... | ... |
为了解决这个问题,我希望在计算RTT的“相对优点”之前,将“最后已知的”RTT复制到下一行:
+------+---------+------------+----------+---------+----------+
| time | avg_rtt | last_known | rtt_good | rtt_bad | rtt_mips |
+------+---------+------------+----------+---------+----------+
| 1 | 300 | 300 | 10 | 5000 | math |
| 2 | NULL | 300 | 10 | 5000 | math |
| 3 | 400 | 400 | 10 | 5000 | math |
| ... | ... | ... | ... | ... | ... |
这可以通过MySQL变量和COALESCE
的组合来完成,如下所示:
select @rtt := coalesce(rtt, @rtt) from metrics
如果rtt
不是NULL
,我们会使用rtt
。如果rtt
NULL
,我们会使用来自上一行的@rtt
变量
将所有这些放在一起,然后在下面获得查询1 。
但是我打算使用它的输出在JavaScript中绘制图形,所以我想将time
列乘以1000
(将秒转换为毫秒)。这导致查询2 ,其具有不同(和意外)的行为。
select
mips.time,
@rtt := coalesce(sum(rtt*reqs)/sum(reqs), @rtt) as rtt,
(coalesce(sum(rtt*reqs)/sum(reqs), @rtt) - rtt_good) / (rtt_bad - rtt_good) as rtt_mips
from
mips
left join
(
select * from metrics where asn = '33095' and cty = 'us'
) t1 on mips.time = t1.time
group by time
order by time asc;
结果:
+------------+-----------------+----------------------+
| time | rtt | rtt_mips |
+------------+-----------------+----------------------+
| 1521731100 | NULL | NULL |
| 1521731400 | NULL | NULL |
| 1521731700 | 12593 | 0.04197666666666667 |
| 1521732000 | 12593 | 0.04197666666666667 |
| 1521732300 | 12593 | 0.04197666666666667 |
| 1521732600 | 12593 | 0.04197666666666667 |
| 1521732900 | 41266.90234375 | 0.13755633333333334 |
| 1521733200 | 41266.90234375 | 0.13755634114583334 |
| 1521733500 | 41266.90234375 | 0.13755634114583334 |
| 1521733800 | 41266.90234375 | 0.13755634114583334 |
| 1521734100 | 41266.90234375 | 0.13755634114583334 |
| 1521734400 | 41266.90234375 | 0.13755634114583334 |
| 1521734700 | 41266.90234375 | 0.13755634114583334 |
| 1521735000 | 14979.439453125 | 0.049931333333333335 |
| 1521735300 | 11812.119140625 | 0.03937366666666667 |
| 1521735600 | 11812.119140625 | 0.03937373046875 |
| 1521735900 | 8738.2314453125 | 0.02912743333333333 |
| 1521736200 | 8738.2314453125 | 0.029127438151041667 |
| 1521736500 | 8738.2314453125 | 0.029127438151041667 |
| 1521736800 | 8738.2314453125 | 0.029127438151041667 |
+------------+-----------------+----------------------+
20 rows in set (0.22 sec)
select
mips.time * 1000 as time, -- The only line that changed
@rtt := coalesce(sum(rtt*reqs)/sum(reqs), @rtt) as rtt,
(coalesce(sum(rtt*reqs)/sum(reqs), @rtt) - rtt_good) / (rtt_bad - rtt_good) as rtt_mips
from
mips
left join
(
select * from metrics where asn = '33095' and cty = 'us'
) t1 on mips.time = t1.time
group by time
order by time asc;
结果:
+---------------+-----------------+----------------------+
| time | rtt | rtt_mips |
+---------------+-----------------+----------------------+
| 1521731100000 | NULL | NULL |
| 1521731400000 | NULL | NULL |
| 1521731700000 | 12593 | 0.04197666666666667 |
| 1521732000000 | NULL | NULL |
| 1521732300000 | NULL | NULL |
| 1521732600000 | NULL | NULL |
| 1521732900000 | 41266.90234375 | 0.13755633333333334 |
| 1521733200000 | NULL | NULL |
| 1521733500000 | NULL | NULL |
| 1521733800000 | NULL | NULL |
| 1521734100000 | NULL | NULL |
| 1521734400000 | NULL | NULL |
| 1521734700000 | NULL | NULL |
| 1521735000000 | 14979.439453125 | 0.049931333333333335 |
| 1521735300000 | 11812.119140625 | 0.03937366666666667 |
| 1521735600000 | NULL | NULL |
| 1521735900000 | 8738.2314453125 | 0.02912743333333333 |
| 1521736200000 | NULL | NULL |
| 1521736500000 | NULL | NULL |
| 1521736800000 | NULL | NULL |
+---------------+-----------------+----------------------+
20 rows in set (0.41 sec)
为什么当我将time
列更改为time * 1000
时,我的变量停止正确设置并且我的查询开始返回NULL
s?
mysql> select version();
+-----------------+
| version() |
+-----------------+
| 10.1.26-MariaDB |
+-----------------+
1 row in set (0.10 sec)
首先,以下查询的结果:
mysql> select * from mips where time = 1521731700000;
Empty set (0.15 sec)
和类似的查询:
mysql> select * from mips where time = 1521731700;
+------------+----------+---------+-----------+----------+----------+---------+-----------+----------+---------+--------+---------+--------+
| time | rtt_good | rtt_bad | rexb_good | rexb_bad | nae_good | nae_bad | util_good | util_bad | fb_good | fb_bad | or_good | or_bad |
+------------+----------+---------+-----------+----------+----------+---------+-----------+----------+---------+--------+---------+--------+
| 1521731700 | 0 | 300000 | 0 | 40 | 25 | 100 | 0 | 80 | 0 | 100 | 0 | 100 |
+------------+----------+---------+-----------+----------+----------+---------+-----------+----------+---------+--------+---------+--------+
1 row in set (0.10 sec)
然后我尝试按rtt_good
和rtt_bad
进行分组,并将time
的{{1}}列乘以1000
查询:
metrics
结果:
select
mips.time * 1000 as time,
@rtt := coalesce(sum(rtt*reqs)/sum(reqs), @rtt) as rtt,
(coalesce(sum(rtt*reqs)/sum(reqs), @rtt) - rtt_good) / (rtt_bad - rtt_good) as rtt_mips
from
mips
left join
(
select time * 1000 as time, rtt, reqs from metrics where asn = '33095' and cty = 'us'
) t1 on mips.time = t1.time
group by time, rtt_good, rtt_bad
order by time asc;
由于+---------------+------+----------+
| time | rtt | rtt_mips |
+---------------+------+----------+
| 1521731100000 | NULL | NULL |
| 1521731400000 | NULL | NULL |
| 1521731700000 | NULL | NULL |
| 1521732000000 | NULL | NULL |
| 1521732300000 | NULL | NULL |
| 1521732600000 | NULL | NULL |
| 1521732900000 | NULL | NULL |
| 1521733200000 | NULL | NULL |
| 1521733500000 | NULL | NULL |
| 1521733800000 | NULL | NULL |
| 1521734100000 | NULL | NULL |
| 1521734400000 | NULL | NULL |
| 1521734700000 | NULL | NULL |
| 1521735000000 | NULL | NULL |
| 1521735300000 | NULL | NULL |
| 1521735600000 | NULL | NULL |
| 1521735900000 | NULL | NULL |
| 1521736200000 | NULL | NULL |
| 1521736500000 | NULL | NULL |
| 1521736800000 | NULL | NULL |
+---------------+------+----------+
20 rows in set (0.17 sec)
表中不存在1521736800000
时间,因此未能正确加入。
即使我没有将mips
列乘以time
,如果我添加了其他1000
列,那么查询仍然无法按预期运行:
group by
结果:
select
mips.time,
@rtt := coalesce(sum(rtt*reqs)/sum(reqs), @rtt) as rtt,
(coalesce(sum(rtt*reqs)/sum(reqs), @rtt) - rtt_good) / (rtt_bad - rtt_good) as rtt_mips
from
mips
left join
(
select time, rtt, reqs from metrics where asn = '33095' and cty = 'us'
) t1 on mips.time = t1.time
group by time, rtt_good, rtt_bad
order by time asc;
我觉得我遇到了一个奇怪的边缘情况,即存储引擎如何优化这些查询。
答案 0 :(得分:1)
我认为这样的事情应该更加可预测:
SELECT mips.time * 1000 AS mips_time,
@prev_rtt := coalesce(m_sum.weighted_rtt, @prev_rtt) as rtt,
(coalesce(m_sum.weighted_rtt, @prev_rtt) - rtt_good) / (rtt_bad - rtt_good) as rtt_mips
FROM
mips
LEFT JOIN
(
SELECT m.time, sum(m.rtt*m.reqs)/sum(m.reqs) AS weighted_rtt
FROM metrics AS m
WHERE m.asn = '33095' and m.cty = 'us'
GROUP BY m.time
) AS m_sum ON mips.time = m_sum.time
ORDER BY mips.time asc;
根据我的经验,(@prev_rtt - rtt_good) / (rtt_bad - rtt_good) as rtt_mips
也可以在此查询中使用,因为as rtt
的上一个表达式应该已分配@prev_rtt
;但这是冒险进入"表现得这样,但实际上并没有得到MySQL"作为MySQL的区域不保证选择表达式的评估顺序。
答案 1 :(得分:0)
将查询更改为此。必须初始化var才能用它计算,否则它是NULL
select
mips.time,
@rtt := coalesce(sum(rtt*reqs)/sum(reqs), @rtt) as rtt,
(coalesce(rtt, @rtt) - rtt_good) / (rtt_bad - rtt_good) as rtt_mips
from
mips
left join
(
select * from metrics where asn = '33095' and cty = 'us'
) t1 on mips.time = t1.time
cross join ( select @rtt := 0 ) as init
group by time
order by time asc;