Question

假设我的查询返回了大约10-30行：

select * from users where location=10;

以下是否有任何区别：

  select * 
  from users u 
      inner join location l on u.location = l.id 
  where location=10;

与

select * from users where location=10;  # say this returns 1,2,3,4,5
select * from location where id IN (1,2,3,4,5)

基本上我想知道在执行内连接和执行WHERE IN子句之间是否存在任何性能差异。

Answer 1

发出一个查询和发出两个查询之间有区别吗？嗯，我当然希望如此。 SQL引擎正在工作，它为两个查询做了两倍的工作（从某个角度来看）。

通常，解析单个查询比解析一个查询，返回中间结果集，然后将其反馈给另一个查询要快。查询编译和来回传递数据都有开销。

对于此查询：

if (is.list(f)) 
    f <- interaction(f, drop = drop, sep = sep)

您需要select * from users u inner join location l on u.location = l.id where u.location = 10;和users(location)上的索引。

我确实想要指出别的东西。查询不等同。真正的比较查询是：

location(id)

您对select l.* from location l where l.id = 10;和where使用了相同的列。因此，这将是最有效的版本，您需要on上的索引。

Answer 2

比较不同查询性能的一种方法是使用postgresql的EXPLAIN命令，例如：

EXPLAIN select * 
from users u inner join
     location l
     on u.location = l.id 
where u.location = 10;

这将告诉您数据库将如何获取数据。注意顺序扫描等事情，这表明您可以从索引中受益。它还可以估算每个操作的成本以及可能运行的行数。某些操作可以产生比预期更多的行，然后数据库会减少到它返回给您的集合。

您还可以使用EXPLAIN ANALYZE [query]，它将实际运行查询并为您提供计时信息。

有关详细信息，请参阅postgresql documentation。