我有两张桌子:item
和status
。
对于每个item
,我需要使用不同条件汇总来自status
表的数据,字段为:ingredientId
,status
,exemptionIds
,因此需要执行多次left join
。
我遇到了性能问题,在现代CPU和SSD驱动器上处理500行需要大约7.5秒。
奇怪的是,如果我在最后一次JOIN中发表评论,它需要大约1,2s,如果注释掉最后2个JOIN,则需要大约0.7s。我希望更多的JOIN可以线性地增加时间,但就我而言,情况并非如此; 我实际上需要添加更多的JOIN,这会引起很大的问题。
DESCRIBE
确认已使用PRIMARY
(复合docId
,ingredientId
索引)
# id, select_type, table, partitions, type, possible_keys, key, key_len, ref, rows, filtered, Extra
'1', 'PRIMARY', '<derived2>', NULL, 'ALL', NULL, NULL, NULL, NULL, '500', '100.00', 'Using temporary; Using filesort'
'1', 'PRIMARY', 'aaa_psn', NULL, 'ref', 'PRIMARY', 'PRIMARY', '4', 'i.docId', '139', '100.00', 'Using where'
'1', 'PRIMARY', 'aaa_psu', NULL, 'ref', 'PRIMARY', 'PRIMARY', '4', 'i.docId', '139', '100.00', 'Using where'
'1', 'PRIMARY', 'aaa_psu2', NULL, 'ref', 'PRIMARY', 'PRIMARY', '4', 'i.docId', '139', '100.00', 'Using where; Using index'
'1', 'PRIMARY', 'aaa_pse', NULL, 'ref', 'PRIMARY', 'PRIMARY', '4', 'i.docId', '139', '100.00', 'Using where'
'1', 'PRIMARY', 'bbb_psn', NULL, 'ref', 'PRIMARY', 'PRIMARY', '4', 'i.docId', '139', '100.00', 'Using where'
'1', 'PRIMARY', 'bbb_psu', NULL, 'ref', 'PRIMARY', 'PRIMARY', '4', 'i.docId', '139', '100.00', 'Using where'
'1', 'PRIMARY', 'bbb_psu2', NULL, 'ref', 'PRIMARY', 'PRIMARY', '4', 'i.docId', '139', '100.00', 'Using where; Using index'
'1', 'PRIMARY', 'bbb_pse', NULL, 'ref', 'PRIMARY', 'PRIMARY', '4', 'i.docId', '139', '100.00', 'Using where'
'2', 'DERIVED', 'i', NULL, 'ALL', NULL, NULL, NULL, NULL, '2378132', '100.00', NULL
任何想法如何改善它?根据这种数据模型,更好的查询方法是什么?或者可能需要更改数据模型?
status
表有大约91.2M行,item
表大约有2.4M行。
每个item
在status
表格中最多包含100个条目。
以下是查询:
select i.*
,coalesce(
if (count(aaa_sn.docId) > 0, 'no', null),
if (count(aaa_su.docId) > 0, 'unknown', null),
if (count(aaa_su2.docId) < 6, 'unknown', null),
if (count(aaa_se.docId) > 0, 'exempt', null),
'yes'
) as aaaCheck
,coalesce(
if (count(bbb_sn.docId) > 0, 'no', null),
if (count(bbb_su.docId) > 0, 'unknown', null),
if (count(bbb_su2.docId) < 24, 'unknown', null),
if (count(bbb_se.docId) > 0, 'exempt', null),
'yes'
) as bbbCheck
from (
select i.id, i.docId from item i limit 100
) i
left join status aaa_sn on aaa_sn.docId = i.docId and aaa_sn.ingredientId IN (1,2,3,4,5,6)
and (aaa_sn.status = 'no' OR (aaa_sn.status = 'exempt' and aaa_sn.exemptionIds NOT IN (29,38,46,162,167,179,180,182,190,191,192,194,202,206,216,234,163,215,216,123,124,125,126,127,128,129,130,131,132,133,136,137,138,139,140,141,142,143,144,145,146,147,149,150,179,182,183,205,220,222,229,230,11,12,23,29,33,37,39,40,41,42,45,151,152,153,154,155,158,159,164,166,167,171,172,178,179,180,181,182,184,185,186,187,188,189,192,193,194,195,196,197,199,200,201,203,207,208,209,210,211,212,213,214,216,217,218,219,221,223,224,225,226,227,228)))
left join status aaa_su on aaa_su.docId = i.docId and aaa_su.ingredientId IN (1,2,3,4,5,6) and aaa_su.status = 'unknown'
left join status aaa_su2 on aaa_su2.docId = i.docId and aaa_su2.ingredientId IN (1,2,3,4,5,6)
left join status aaa_se on aaa_se.docId = i.docId and aaa_se.ingredientId IN (1,2,3,4,5,6)
and aaa_se.status = 'exempt' and aaa_se.exemptionIds IN (29,38,46,162,167,179,180,182,190,191,192,194,202,206,216,234,163,215,216,123,124,125,126,127,128,129,130,131,132,133,136,137,138,139,140,141,142,143,144,145,146,147,149,150,179,182,183,205,220,222,229,230,11,12,23,29,33,37,39,40,41,42,45,151,152,153,154,155,158,159,164,166,167,171,172,178,179,180,181,182,184,185,186,187,188,189,192,193,194,195,196,197,199,200,201,203,207,208,209,210,211,212,213,214,216,217,218,219,221,223,224,225,226,227,228)
left join status bbb_sn on bbb_sn.docId = i.docId and bbb_sn.ingredientId IN (19,22,23,25,27,28,29,30,31,33,35,38,43,44,45,60,115,163,164,192,324,325,366,367)
and (bbb_sn.status = 'no' OR (bbb_sn.status = 'exempt' and bbb_sn.exemptionIds NOT IN (48,235,47,239,235,48,48,239,235,235,238,236,237,239)))
left join status bbb_su on bbb_su.docId = i.docId and bbb_su.ingredientId IN (19,22,23,25,27,28,29,30,31,33,35,38,43,44,45,60,115,163,164,192,324,325,366,367) and bbb_su.status = 'unknown'
left join status bbb_su2 on bbb_su2.docId = i.docId and bbb_su2.ingredientId IN (19,22,23,25,27,28,29,30,31,33,35,38,43,44,45,60,115,163,164,192,324,325,366,367)
left join status bbb_se on bbb_se.docId = i.docId and bbb_se.ingredientId IN (19,22,23,25,27,28,29,30,31,33,35,38,43,44,45,60,115,163,164,192,324,325,366,367)
and bbb_se.status = 'exempt' and bbb_se.exemptionIds IN (48,235,47,239,235,48,48,239,235,235,238,236,237,239)
group by i.id
答案 0 :(得分:1)
似乎问题中的查询将生成半笛卡尔(半交叉)产品...将status
中的行与status
中的其他行匹配,可能会使计数膨胀
我怀疑我们只需要加入status
表一次,匹配docId
,然后我们可以通过SELECT列表中的表达式中的某些条件测试来运行行。
作为此方法的简化示例(尚未引入聚合,请考虑:
SELECT i.id
, i.docid
, s.ingredientId
, s.status
, s.exemptionId
, IF( s.ingredientId IN (1,2,3,4,5,6) AND s.status = 'unknown' ,1,0) AS aaa_su
, IF( s.ingredientId IN (1,2,3,4,5,6) ,1,0) AS aaa_su2
FROM ( SELECT j.id
, j.docid
FROM item j
ORDER BY j.docid, j.id
LIMIT 100
) i
LEFT
JOIN status s
ON s.docid = i.docid
ORDER BY i.id, i.docid
对于每个&#34;匹配&#34;从status
开始,IF()
函数被评估。第一个表达式被计算为布尔值;如果为TRUE,则函数返回第二个表达式,否则返回第三个表达式。
我在此查询中仅包含两个较简单的检查;我省略了更复杂的表达式,只是为了证明这是如何工作的。 (我们可以扩展此模式以在SELECT列表中添加额外的IF()
表达式以进行其他检查。
我还在s
中添加了一些在条件中检查过的列,因此我们可以验证我们是否按照预期获得了1和0。 (一个更复杂的条件,特别是使用AND和OR,这将有助于我们验证检查是否按照我们的意图进行。
下一步是添加GROUP BY
子句,并将这些IF()
表达式包含在聚合函数中,例如&#39; SUM()`。
如果我们想使用1
来计算&#34;计算&#34;那么0
和SUM()
就很方便了行。
SELECT i.id
, i.docid
, SUM(IF( s.ingredientId IN (1,2,3,4,5,6) AND s.status = 'unknown' ,1,0)) AS cnt_aaa_su
, SUM(IF( s.ingredientId IN (1,2,3,4,5,6) ,1,0)) AS cnt_aaa_su2
FROM ( SELECT j.id
, j.docid
FROM item j
ORDER BY j.docid, j.id
LIMIT 100
) i
LEFT
JOIN status s
ON s.docid = i.docid
GROUP BY i.id, i.docid
如果我们想使用COUNT()
代替SUM()
,我们可以返回任何非NULL值作为第二个参数,并且需要返回NULL作为第三个参数,例如:< / p>
, COUNT(IF( s.ingredientId IN (1,2,3,4,5,6) ,'x',NULL) AS aaa_su2