我正在寻找一些关于如何更好地优化此查询的建议。
对于每个_piece_detail
记录:
_scan
记录(zip,zip_4,
zip_delivery_point,serial_number)mailing_groups
的公司(通过一系列关系)first_scan_date_time
大于相关_scan记录的MIN(scan_date_time)
latest_scan_date_time
小于MAX(scan_date_time)
相关的_scan
记录我需要:
_piece_detail.first_scan_date_time
设为MIN(_scan.scan_date_time)
_piece_detail.latest_scan_date_time
设为MAX(_scan.scan_date_time)
由于我正在处理数百万条记录,因此我试图减少实际需要搜索的记录数量。以下是有关数据的一些事实:
job_id
分区,所以看起来如此
最有意义的是按顺序运行这些检查
_piece_detail.job_id
,_piece_detail.piece_id
。_piece_detail
匹配(除了scan_date_time)。_piece_detail
条记录属于mailing_group
,但在运行之前我们不知道这些记录是哪些
通过连接的完整关系。_piece_detail
mailing_group
。_scan
段通常有0到4 _piece_detail
条记录。现在,我正在寻找一种以合适的方式执行此操作的方法。我最初是从这样的事情开始的:
UPDATE _piece_detail
INNER JOIN (
SELECT _piece_detail.job_id, _piece_detail.piece_id, MIN(_scan.scan_date_time) as first_scan_date_time, MAX(_scan.scan_date_time) as latest_scan_date_time
FROM _piece_detail
INNER JOIN _container_quantity
ON _piece_detail.cqt_database_id = _container_quantity.cqt_database_id
AND _piece_detail.job_id = _container_quantity.job_id
INNER JOIN _container_summary
ON _container_quantity.container_id = _container_summary.container_id
AND _container_summary.job_id = _container_quantity.job_id
INNER JOIN _mail_piece_unit
ON _container_quantity.mpu_id = _mail_piece_unit.mpu_id
AND _container_quantity.job_id = _mail_piece_unit.job_id
INNER JOIN _header
ON _header.job_id = _piece_detail.job_id
INNER JOIN mailing_groups
ON _mail_piece_unit.mpu_company = mailing_groups.mpu_company
INNER JOIN _scan
ON _scan.zip = _piece_detail.zip
AND _scan.zip_4 = _piece_detail.zip_4
AND _scan.zip_delivery_point = _piece_detail.zip_delivery_point
AND _scan.serial_number = _piece_detail.serial_number
GROUP BY _piece_detail.job_id, _piece_detail.piece_id, _scan.zip, _scan.zip_4, _scan.zip_delivery_point, _scan.serial_number
) as t1 ON _piece_detail.job_id = t1.job_id AND _piece_detail.piece_id = t1.piece_id
SET _piece_detail.first_scan_date_time = t1.first_scan_date_time, _piece_detail.latest_scan_date_time = t1.latest_scan_date_time
WHERE _piece_detail.first_scan_date_time < t1.first_scan_date_time
OR _piece_detail.latest_scan_date_time > t1.latest_scan_date_time;
我认为这可能是一次尝试加载到内存中太多而且可能没有正确使用索引。
然后我认为我可以避免做那个巨大的连接子查询并添加两个leftjoin子查询来获得min / max,如下所示:
UPDATE _piece_detail
INNER JOIN _container_quantity
ON _piece_detail.cqt_database_id = _container_quantity.cqt_database_id
AND _piece_detail.job_id = _container_quantity.job_id
INNER JOIN _container_summary
ON _container_quantity.container_id = _container_summary.container_id
AND _container_summary.job_id = _container_quantity.job_id
INNER JOIN _mail_piece_unit
ON _container_quantity.mpu_id = _mail_piece_unit.mpu_id
AND _container_quantity.job_id = _mail_piece_unit.job_id
INNER JOIN _header
ON _header.job_id = _piece_detail.job_id
INNER JOIN mailing_groups
ON _mail_piece_unit.mpu_company = mailing_groups.mpu_company
LEFT JOIN _scan fs ON (fs.zip, fs.zip_4, fs.zip_delivery_point, fs.serial_number) = (
SELECT zip, zip_4, zip_delivery_point, serial_number
FROM _scan
WHERE zip = _piece_detail.zip
AND zip_4 = _piece_detail.zip_4
AND zip_delivery_point = _piece_detail.zip_delivery_point
AND serial_number = _piece_detail.serial_number
ORDER BY scan_date_time ASC
LIMIT 1
)
LEFT JOIN _scan ls ON (ls.zip, ls.zip_4, ls.zip_delivery_point, ls.serial_number) = (
SELECT zip, zip_4, zip_delivery_point, serial_number
FROM _scan
WHERE zip = _piece_detail.zip
AND zip_4 = _piece_detail.zip_4
AND zip_delivery_point = _piece_detail.zip_delivery_point
AND serial_number = _piece_detail.serial_number
ORDER BY scan_date_time DESC
LIMIT 1
)
SET _piece_detail.first_scan_date_time = fs.scan_date_time, _piece_detail.latest_scan_date_time = ls.scan_date_time
WHERE _piece_detail.first_scan_date_time < fs.scan_date_time
OR _piece_detail.latest_scan_date_time > ls.scan_date_time
这些是我将它们转换为SELECT语句时的解释:
+----+-------------+---------------------+--------+----------------------------------------------------+---------------+---------+------------------------------------------------------------------------------------------------------------------------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------------+--------+----------------------------------------------------+---------------+---------+------------------------------------------------------------------------------------------------------------------------+--------+----------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 844161 | NULL |
| 1 | PRIMARY | _piece_detail | eq_ref | PRIMARY,first_scan_date_time,latest_scan_date_time | PRIMARY | 18 | t1.job_id,t1.piece_id | 1 | Using where |
| 2 | DERIVED | _header | index | PRIMARY | date_prepared | 3 | NULL | 87 | Using index; Using temporary; Using filesort |
| 2 | DERIVED | _piece_detail | ref | PRIMARY,cqt_database_id,zip | PRIMARY | 10 | odms._header.job_id | 9703 | NULL |
| 2 | DERIVED | _container_quantity | eq_ref | unique,mpu_id,job_id,job_id_container_quantity | unique | 14 | odms._header.job_id,odms._piece_detail.cqt_database_id | 1 | NULL |
| 2 | DERIVED | _mail_piece_unit | eq_ref | PRIMARY,company,job_id_mail_piece_unit | PRIMARY | 14 | odms._container_quantity.mpu_id,odms._header.job_id | 1 | Using where |
| 2 | DERIVED | mailing_groups | eq_ref | PRIMARY | PRIMARY | 27 | odms._mail_piece_unit.mpu_company | 1 | Using index |
| 2 | DERIVED | _container_summary | eq_ref | unique,container_id,job_id_container_summary | unique | 14 | odms._header.job_id,odms._container_quantity.container_id | 1 | Using index |
| 2 | DERIVED | _scan | ref | PRIMARY | PRIMARY | 28 | odms._piece_detail.zip,odms._piece_detail.zip_4,odms._piece_detail.zip_delivery_point,odms._piece_detail.serial_number | 1 | Using index |
+----+-------------+---------------------+--------+----------------------------------------------------+---------------+---------+------------------------------------------------------------------------------------------------------------------------+--------+----------------------------------------------+
+----+--------------------+---------------------+--------+--------------------------------------------------------------------+---------------+---------+------------------------------------------------------------------------------------------------------------------------+-----------+-----------------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+---------------------+--------+--------------------------------------------------------------------+---------------+---------+------------------------------------------------------------------------------------------------------------------------+-----------+-----------------------------------------------------------------+
| 1 | PRIMARY | _header | index | PRIMARY | date_prepared | 3 | NULL | 87 | Using index |
| 1 | PRIMARY | _piece_detail | ref | PRIMARY,cqt_database_id,first_scan_date_time,latest_scan_date_time | PRIMARY | 10 | odms._header.job_id | 9703 | NULL |
| 1 | PRIMARY | _container_quantity | eq_ref | unique,mpu_id,job_id,job_id_container_quantity | unique | 14 | odms._header.job_id,odms._piece_detail.cqt_database_id | 1 | NULL |
| 1 | PRIMARY | _mail_piece_unit | eq_ref | PRIMARY,company,job_id_mail_piece_unit | PRIMARY | 14 | odms._container_quantity.mpu_id,odms._header.job_id | 1 | Using where |
| 1 | PRIMARY | mailing_groups | eq_ref | PRIMARY | PRIMARY | 27 | odms._mail_piece_unit.mpu_company | 1 | Using index |
| 1 | PRIMARY | _container_summary | eq_ref | unique,container_id,job_id_container_summary | unique | 14 | odms._header.job_id,odms._container_quantity.container_id | 1 | Using index |
| 1 | PRIMARY | fs | index | NULL | updated | 1 | NULL | 102462928 | Using where; Using index; Using join buffer (Block Nested Loop) |
| 1 | PRIMARY | ls | index | NULL | updated | 1 | NULL | 102462928 | Using where; Using index; Using join buffer (Block Nested Loop) |
| 3 | DEPENDENT SUBQUERY | _scan | ref | PRIMARY | PRIMARY | 28 | odms._piece_detail.zip,odms._piece_detail.zip_4,odms._piece_detail.zip_delivery_point,odms._piece_detail.serial_number | 1 | Using where; Using index; Using filesort |
| 2 | DEPENDENT SUBQUERY | _scan | ref | PRIMARY | PRIMARY | 28 | odms._piece_detail.zip,odms._piece_detail.zip_4,odms._piece_detail.zip_delivery_point,odms._piece_detail.serial_number | 1 | Using where; Using index; Using filesort |
+----+--------------------+---------------------+--------+--------------------------------------------------------------------+---------------+---------+------------------------------------------------------------------------------------------------------------------------+-----------+-----------------------------------------------------------------+
现在,看看每个产生的解释,我真的不知道哪个给了我最好的回报。第一个显示乘以行列时总行数较少,但第二个似乎执行得更快一点。
在通过修改查询结构来提高性能的同时,我能做些什么来实现相同的结果?
答案 0 :(得分:1)
执行批量更新时禁用索引更新
ALTER TABLE _piece_detail DISABLE KEYS;
UPDATE ....;
ALTER TABLE _piece_detail ENABLE KEYS;
请参阅mysql文档:http://dev.mysql.com/doc/refman/5.0/en/alter-table.html
编辑: 在查看我指出的mysql文档之后,我看到文档为MyISAM表指定了这个,并且对于其他表类型是明确的。此处有更多解决方案:How to disable index in innodb
答案 1 :(得分:1)
有一些我被教过的东西,我严格遵循直到今天 - 创建尽可能多的临时表,同时避免使用派生表。特别是在UPDATE / DELETE / INSERTs为
的情况下最重要的是,它使代码看起来整洁可读。
我的方法将是
CREATE table temp xxx(...)
INSERT INTO xxx select q from y inner join z....;
UPDATE _piece_detail INNER JOIN xxx on (...) SET ...;
始终减少您的停机时间!!
答案 2 :(得分:0)
为什么不对每个联接使用子查询?包括内连接?
INNER JOIN (SELECT field1, field2, field 3 from _container_quantity order by 1,2,3)
ON _piece_detail.cqt_database_id = _container_quantity.cqt_database_id
AND _piece_detail.job_id = _container_quantity.job_id
INNER JOIN (SELECT field1, field2, field3 from _container_summary order by 1,2,3)
ON _container_quantity.container_id = _container_summary.container_id
AND _container_summary.job_id = _container_quantity.job_id
通过不限制你对这些内连接的选择,你肯定会大量投入内存。通过在每个子查询的末尾使用1,2,3的顺序,您可以在每个子查询上创建索引。你唯一的索引是在标题上,你不能加入_headers ....
一些优化此查询的建议。在每个表上创建所需的索引,或使用子查询连接子句手动创建动态所需的索引。
另外请记住,当您在&#34;临时&#34;上进行左连接时表满是聚合,你只是要求性能问题。
包含至少一个匹配的_scan记录(zip,zip_4, zip_delivery_point,serial_number)
嗯......这是你想要做的第一点,但这些字段都没有编入索引?
答案 3 :(得分:0)
从您的解释结果看,子查询似乎经过了两次所有行,那么你如何保持MIN / MAX不是第一个,而只使用一个左连接而不是两个?