SQL:选择特定的三元组用户

时间:2015-01-11 16:13:21

标签: sql sqlite

给出一个用户表:

User(id INT, username VARCHAR(30))

并指导他们之间的关系:

Following(follower_id INT, followee_id INT)

我需要为所有独特的三位一体用户提供SELECT,例如:

A   follows   B
B   follows   A
A   follows   C
C not follows A
B not follows C
C   follows   B

我正在使用SQLite数据库并使用Python。有了上面的示例SELECT,我可能很快就可以完成我追求的所有其他三元组。这些基本上是三个用户中定向连接的所有可能组合。

2 个答案:

答案 0 :(得分:1)

这有点复杂,但你可以这样做:

with pairs as (
      select f1.followee_id, f1.follower_id
      from following f1 join
           following f2
           on f1.follower_id = f2.followee_id and
              f1.followee_id = f2.follower_id
     )
select p1.followee as A, p1.follower as B, p3.followee as C
from pairs p1 join
     pairs p2
     on p1.followee_id = p2.followee_id join
     pairs p3
     on p3.followee_id = p1.follower_id and
        p3.follower_id = p2.follower_id;

这个想法是pairs获得彼此跟随的用户对。然后寻找添加第三个人的其他对。

另一种方法是生成所有组合,然后选择匹配的组合:

select a.id, b.id, c.id
from users a join
     users b
     on a.id < b.id join
     users c
     on b.id < c.id
where exists (select 1 from following f where f.follower_id = a.id and f.followee_id = b.id) and
      exists (select 1 from following f where f.follower_id = b.id and f.followee_id = a.id) and
      exists (select 1 from following f where f.follower_id = a.id and f.followee_id = c.id) and
      exists (select 1 from following f where f.follower_id = c.id and f.followee_id = a.id) and
      exists (select 1 from following f where f.follower_id = b.id and f.followee_id = c.id) and
      exists (select 1 from following f where f.follower_id = c.id and f.followee_id = b.id);

如果您在表上设置了合理的索引,则此版本可能实际上具有更好的性能。

编辑:

对于性能,following表应该在follower_id, followee_id上有索引 - 这是一个包含两列的复合索引。

答案 1 :(得分:0)

SELECT ab.follower_id AS a_id,
       ab.followee_id AS b_id,
       ac.followee_id AS c_id
FROM following AS ab
JOIN following AS ba ON ab.followee_id = ba.follower_id
                    AND ab.follower_id = ba.followee_id
JOIN following AS ac ON ab.follower_id = ac.follower_id
JOIN following AS cb ON ac.followee_id = cb.follower_id
                    AND ab.followee_id = cb.followee_id
LEFT OUTER JOIN following AS ca ON ac.followee_id = ca.follower_id
                               AND ac.follower_id = ca.followee_id
LEFT OUTER JOIN following AS bc ON cb.followee_id = bc.follower_id
                               AND cb.follower_id = bc.followee_id
WHERE ab.follower_id < ab.followee_id
  AND ab.followee_id < ac.followee_id
  AND ca.follower_id IS NULL
  AND bc.follower_id IS NULL

300万条记录在30秒内执行,相比之下,戈登提出的EXIST版本为45k秒。