我试图计算最大同时通话次数。我的相信准确的查询在给定〜250,000行时花费的时间太长。 cdrs表看起来像这样:
+---------------+-----------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------+-----------------------+------+-----+---------+----------------+
| id | bigint(20) unsigned | NO | PRI | NULL | auto_increment |
| CallType | varchar(32) | NO | | NULL | |
| StartTime | datetime | NO | MUL | NULL | |
| StopTime | datetime | NO | | NULL | |
| CallDuration | float(10,5) | NO | | NULL | |
| BillDuration | mediumint(8) unsigned | NO | | NULL | |
| CallMinimum | tinyint(3) unsigned | NO | | NULL | |
| CallIncrement | tinyint(3) unsigned | NO | | NULL | |
| BasePrice | float(12,9) | NO | | NULL | |
| CallPrice | float(12,9) | NO | | NULL | |
| TransactionId | varchar(20) | NO | | NULL | |
| CustomerIP | varchar(15) | NO | | NULL | |
| ANI | varchar(20) | NO | | NULL | |
| ANIState | varchar(10) | NO | | NULL | |
| DNIS | varchar(20) | NO | | NULL | |
| LRN | varchar(20) | NO | | NULL | |
| DNISState | varchar(10) | NO | | NULL | |
| DNISLATA | varchar(10) | NO | | NULL | |
| DNISOCN | varchar(10) | NO | | NULL | |
| OrigTier | varchar(10) | NO | | NULL | |
| TermRateDeck | varchar(20) | NO | | NULL | |
+---------------+-----------------------+------+-----+---------+----------------+
我有以下索引:
+-------+------------+-----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+-----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| cdrs | 0 | PRIMARY | 1 | id | A | 269622 | NULL | NULL | | BTREE | | |
| cdrs | 1 | id | 1 | id | A | 269622 | NULL | NULL | | BTREE | | |
| cdrs | 1 | call_time_index | 1 | StartTime | A | 269622 | NULL | NULL | | BTREE | | |
| cdrs | 1 | call_time_index | 2 | StopTime | A | 269622 | NULL | NULL | | BTREE | | |
+-------+------------+-----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
我正在运行的查询是:
SELECT MAX(cnt) AS max_channels FROM
(SELECT cl1.StartTime, COUNT(*) AS cnt
FROM cdrs cl1
INNER JOIN cdrs cl2
ON cl1.StartTime
BETWEEN cl2.StartTime AND cl2.StopTime
GROUP BY cl1.id)
AS counts;
似乎我可能每天都要对这些数据进行分块,并将结果存储在一个单独的表中,如simultaneous_calls
。
答案 0 :(得分:2)
我确定您不仅要知道最大同时通话次数,还要知道发生时的。
我会创建一个包含每个分钟的时间戳的表
CREATE TABLE times (ts DATETIME UNSIGNED AUTO_INCREMENT PRIMARY KEY);
INSERT INTO times (ts) VALUES ('2014-05-14 00:00:00');
. . . until 1440 rows, one for each minute . . .
然后将其加入电话会议。
SELECT ts, COUNT(*) AS count FROM times
JOIN cdrs ON times.ts BETWEEN cdrs.starttime AND cdrs.stoptime
GROUP BY ts ORDER BY count DESC LIMIT 1;
这是我的测试结果(在Macbook Pro上运行的Linux VM上的MySQL 5.6.17):
+---------------------+----------+
| ts | count(*) |
+---------------------+----------+
| 2014-05-14 10:59:00 | 1001 |
+---------------------+----------+
1 row in set (1 min 3.90 sec)
这实现了几个目标:
这是我的查询的EXPLAIN:
explain select ts, count(*) from times join cdrs on times.ts between cdrs.starttime and cdrs.stoptime group by ts order by count(*) desc limit 1;
+----+-------------+-------+-------+---------------+---------+---------+------+--------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+------+--------+------------------------------------------------+
| 1 | SIMPLE | times | index | PRIMARY | PRIMARY | 5 | NULL | 1440 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | cdrs | ALL | starttime | NULL | NULL | NULL | 260727 | Range checked for each record (index map: 0x4) |
+----+-------------+-------+-------+---------------+---------+---------+------+--------+------------------------------------------------+
注意行列中的数字,并与原始查询的EXPLAIN进行比较。您可以通过将这些行相乘来估计检查的总行数(但如果您的查询不是SIMPLE,则会变得更复杂。)
答案 1 :(得分:1)
内联视图并非绝对必要。 (你有很多时间在内联视图的查询上运行EXPLAIN,EXPLAIN将实现内联视图(即运行内联视图查询并填充派生表),然后给出一个EXPLAIN外部查询。
请注意,此查询将返回等效结果:
SELECT COUNT(*) AS max_channels
FROM cdrs cl1
JOIN cdrs cl2
ON cl1.StartTime BETWEEN cl2.StartTime AND cl2.StopTime
GROUP BY cl1.id
ORDER BY max_channels DESC
LIMIT 1
虽然它仍然需要做所有的工作,并且可能没有更好的表现; EXPLAIN应该运行得更快。 (我们希望在Extra列中看到“Using temporary; Using filesort”。)
结果集中的行数将是表中的行数(~250,000行),并且需要对这些行进行排序,因此需要一段时间。更大的问题(我的直觉告诉我)是加入操作。
我想知道如果在谓词中交换cl1和cl2,EXPLAIN(或性能)是否会有所不同,即
ON cl2.StartTime BETWEEN cl1.StartTime AND cl1.StopTime
我在想,只是因为我想尝试一个相关的子查询。那是〜250,000次执行,而且不太可能会更快......
SELECT ( SELECT COUNT(*)
FROM cdrs cl2
WHERE cl2.StartTime BETWEEN cl1.StartTime AND cl1.StopTime
) AS max_channels
, cl1.StartTime
FROM cdrs cl1
ORDER BY max_channels DESC
LIMIT 11
你可以运行一个EXPLAIN,我们仍然会看到“使用临时;使用filesort”,它还会显示“依赖子查询”......
显然,在cl1表上添加谓词以减少要返回的行数(例如,仅检查过去15天);这应该可以加快速度,但它无法为您提供所需的答案。
WHERE cl1.StartTime > NOW() - INTERVAL 15 DAY
(我在这里的任何想法都不是对你的问题的肯定答案,或者对性能问题的解决方案;它们只是在思考。)