Question

我有一个看起来像这样的表

id_a, id_b, statistic

该表具有1M〜1024 * 1024个记录，包含id_a和id_b的所有组合。我以前基于id_a和id_b计算了统计信息（浮点数），现在想收集所有id_a和id_b对的列表，以使每对的可能性最低统计信息，而且id_a和id_b在两列中仅出现一次。

好的结果如下：

[1,2, 0.0]
[5,3, 0.1]
[7,9, 0.3]
...

如您所见，第一列和第二列中的每个数字在它们两个之间仅出现一次。我无法通过添加来扩展此集合 [6,7，_]或[5,6，_]

到目前为止，我的解决方案看起来像是SQL请求的顺序扩展：

exclude_abs = []
while s < maxSize:
    a, b, stat = selectBestSystem(exclude_abs)
    exclude_abs.extend([a,b])

def selectBestSystem(exclude_abs):
    exclude_req = " AND ".join( map(lambda x: ("id_a!=%(1) and id_b!=%(1)" % x), exclude_abs) )
    req = ("SELECT id_a, id_b, statistic FROM table WHERE"+ \
          exclude_req+" ORDER BY statistic ASC LIMIT 1"
    return db.process(req)

这样创建的请求在前100对提取后看起来很恐怖：

SELECT id_a, id_b, statistic FROM table WHERE
id_a!=1 and id_b!=1 and
id_a!=2 and id_b!=2 and
id_a!=5 and id_b!=5 and
id_a!=3 and id_b!=3 and
id_a!=7 and id_b!=7 and
id_a!=9 and id_b!=9 and
[...skipped 200 conditions...]
ORDER BY statistic ASC LIMIT 1

因此，在选择100对以上之后，需要15+秒的时间来处理此查询。是否有更好的方法在MySQL上执行此消除的顺序过程？也许我的数据结构是垃圾箱，我不应该从关系数据库开始吗？

DB是AWS RDS Aurora 5.6.10a

SQL顺序消除

0 个答案: