Question

让db1和db2 db1.table1

annee   code    code2   var1    ....
1991    11    12    779
1991    11    14    105
1991    11    15    10
1991    12    11    466
1991    12    14    296
1991    12    15    270
1991    14    11    15
1991    14    12    510
1991    14    15    6
1991    15    11    193
1991    15    12    455
1991    15    14    4   
....
1992    11    12    779
1992    11    14    105
1992    11    15    10
1992    12    11    466
1992    12    14    296
1992    12    15    270
1992    14    11    15
1992    14    12    510
1992    14    15    6
1992    15    11    193
1992    15    12    455
1992    15    14    4   
....

db2.table2

var1    code    ...
test  11
test  12
test  14
test2 11
test2 14
test2 15
...

我需要优化以下查询（因为db1.table1包含8 000 000行）：

select annee,sum(var1) from db1.table1 as M where 
M.code in 
(select t1.code from db2.table2 as t1 cross join db2.table2 as t2 where t1.var1='Test2' and t2.var1='Test2' and t1.code <> t2.code) 
and M.code2 in 
(select t2.code from db2.table2 as t1 cross join db2.table2 as t2 where t1.var1='Test2' and t2.var1='Test2' and t1.code <> t2.code) 
group by annee order by annee desc

对db1.table1和db2.table2进行索引和排序。任何建议将不胜感激！感谢

Answer 1

作为变体，您可以尝试以下

select m.annee,sum(m.var1)
from db1.table1 m
join
  (
    select t1.code code1,t2.code code2
    from db2.table2 t1
    join db2.table2 t2 on t1.var1='Test2' and t2.var1=t1.var1 and t1.code<t2.code
  ) c
on (m.code=c.code1 and m.code2=c.code2) or (m.code=c.code2 and m.code2=c.code1)
group by m.annee
order by m.annee desc

我使用JOIN而不是CROSS JOIN和JOIN而不是IN。

如果它适合你，你可以尝试优化查询

select m.annee,sum(m.var1)
from db2.table2 t1
join db2.table2 t2 on t1.var1='Test2' and t2.var1=t1.var1 and t1.code<t2.code
join db1.table1 m on (m.code=t1.code and m.code2=t2.code) or (m.code=t2.code and m.code2=t1.code)
group by m.annee
order by m.annee desc

第一个JOIN会返回test2的所有组合。有（11,12）和（11,14）

db2.table2 t1
join db2.table2 t2 on t1.var1='Test2' and t2.var1=t1.var1 and t1.code<t2.code

第二个JOIN检查table1对这些组合的行

join db1.table1 m on (m.code=t1.code and m.code2=t2.code) or (m.code=t2.code and m.code2=t1.code)

尝试检查下一个变种

select m.annee,sum(m.var1)
from db2.table2 t1
join db2.table2 t2 on t1.var1='Test2' and t2.var1=t1.var1 and t1.code<>t2.code
join db1.table1 m on m.code=t1.code and m.code2=t2.code
group by m.annee
order by m.annee desc

如果最后一个变体返回正确的结果，那么您可以尝试将(code,code2)的索引添加到table1

CREATE INDEX idx_table1_code_code2 ON db1.table1 (code,code2)

Answer 2

我试图让你的查询逻辑更简单。希望这个帮助

    select annee,sum(var1) 
        from db1.table1 as M where 
            exists( select var1 from db2.table2 t2 
                        where t2.var1='Test2' 
                        group by t2.var1 
                        having sum(t2.code = M.code) >= 1 
                            and sum(t2.code = M.code2) >= 1 
                            and (M.code != M.code2 or sum(t2.code != M.code) >= 1))

        group by annee 
        order by annee desc

Answer 3

table2: INDEX(var1, code)
table1: INDEX(code, code2, annee)

将IN ( SELECT ... )更改为JOIN ( SELECT ... ) ON ...;前者的优化程度很低。

如果您使用的是MySQL 5.6或更高版本，则可以充分优化子查询。如果您使用的是旧版本，请使用该重复的子查询创建TEMPORARY TABLE。

2个表上的Mysql优化查询

3 个答案: