Question

我正在研究一个比较引擎，该引擎从许多回扣网站获得折扣，并为特定商店堆叠并堆叠它们。

我有一个idlinks表，它将商店的折扣网站ID与主商店列表相关联：

idlinks (rebate_site_id      int,
         store_id_from_site  text,
         store_id_master     text)

然后我在rebates表格中为所有商店的所有网站编译折扣：

rebates (rebate_site_id      int,
         store_id_from_site  text,
         rebate_amount       text)

由于新的商店和折扣一直在上升，我想要找出我尚未与主列表相关的折扣。为此，我运行一个查询：

select * from rebates
left join idlinks on (rebates.rebate_site_id = idlinks.rebate_site_id and
                      rebates.store_id_from_site = idlinks.store_id_from_site)
where (idlinks.rebate_site_id is null and idlinks.store_id_from_site is null)

这样可行，但每张表中只有大约30k行需要大约5分钟，这似乎很长。我在一台非古老的Windows 7机器上使用Python中的sqlite3 3.7.4。我的代码：

import sqlite3

conn = sqlite3.connect('my.db')
c = conn.cursor()
c.execute('''<the SQL statement above>''')
conn.close()

我认为比较所有两个表中的两个字段是一直在进行的。如果我一次只能比较一个特定的回扣网站，我认为它会更快。基本上每个rebate_site_id单独执行此操作，并合并：

idlinks_1:  select * from idlinks where rebate_site_id = 1
rebates_1:  select * from rebates where rebate_site_id = 1

unmatched_1 = select * from rebates_1
              left join idlinks_1
                  on rebates_1.store_id_from_site = idlinks_1.store_id_from_site
              where idlinks_1.store_id_from_site is null

idlinks_1和rebates_1查询速度很快。我测试了特定折扣网站的子集表上的unmatched_1查询，而且速度要快得多。

我尝试使用子查询执行此操作，但它没有改善执行时间：

select * from rebates
left join (select * from idlinks where idlinks.rebate_site_id = 1)
    on rebates.store_id_from_site = idlinks.store_id_from_site
where rebates.rebate_site_id = 1 and idlinks.store_id_from_site is null

有没有办法可以重写查询，只对表格中某个特定折扣网站的部分进行连接？或者，有没有办法将快速查询的结果输入另一个execute语句，我可以循环遍历所有rebate_site_id s？

Answer 1

尝试创建索引：

CREATE INDEX idlinks_i1 ON idlinks(rebate_site_id,store_id_from_site);
CREATE INDEX rebates_i1 ON rebates(rebate_site_id,store_id_from_site);

这将加快您的第一次查询。

Answer 2

如果您只是获取空值，为什么还需要加入？

select * from rebates
where (rebates.rebate_site_id is null and rebates.store_id_from_site is null)

处理两列值的两表连接

2 个答案: