Question

我有2个表（名称（字段））：

data(object_id, property_id, value_id)

和

string(id, value)

所有数据都在“字符串”表中。 “data”仅指相应的字符串。

例如，我有：

data(1,2,3)  
data(1,4,5)  
data(6,4,7)



string(1, 'car')  
string(2, 'color')  
string(3, 'red')  
string(4, 'make')  
string(5, 'audi')  
string(6, 'car2')  
string(7, 'toyota')

现在我想要的是，当我删除数据表中的某些行时，字符串表中的所有孤立行也将被删除：

如果删除数据（6,4,7）然后删除id为6和7的字符串（因为它们不再使用）; 4用于另一个数据行，因此不会删除。

我的问题是，如何为字符串表编写优化的删除查询？

目前我有类似的东西（有效，但速度很慢）：

delete  
from string s  
where 1=1  
and (select count(id) from data where object_id = s.id) = 0  
and (select count(id) from data where property_id = s.id) = 0  
and (select count(id) from data where value_id = s.id) = 0

我也尝试过（取决于孤儿计数有时会使结果快10-20％）：

delete from string  
where (id not in (select usedids.id from (select object_id as id from data  
    union  
    select property_id as id from data  
    union  
    select value_id as id from data) as usedids)  
);

我在两个表中都有大约10万行。如果我删除数据表中的大约6000行，那么清理字符串表大约需要3分钟。每个领域都有一个索引。我也有外键约束。

Answer 1

您想要EXISTS。

delete  
from string s  
where 1=1  
and (select count(id) from data where object_id = s.id) = 0

实际上是以

正确完成的

delete from string s
where not exists ( select * from data d where d.object_id = s.id)

您实际上并不想count，而只想了解子表exists。

除此之外，请注意，如果您使用外键，所有这些都将为您处理。在获得此代码后，这应该是您的下一步。

Answer 2

问题似乎是您正在访问数据表而不是一次。

您的查询应该是

DELETE FROM string WHERE（SELECT count（*）FROM data WHERE object_id = s.id OR property_id = s.id OR value_id = s.id）= 0;

尚未尝试过，但如果您让我知道发生了什么，我可以进行修改。

如何在Postgresql中优化慢删除查询（删除另一个表中未使用的数据）

2 个答案: