Question

我有一个entity-attribute-value格式的数据库表，如下所示：

radiology table

我希望选择具有相同值的所有行＆＃39;实体＆＃39;和＆＃39;属性＆＃39;列，但具有不同的值＆＃39;值＆＃39;柱。所有三列具有相同值的多行应视为单行。我实现这一点的方法是使用SELECT DISTINCT。

def acronyms(text):
    my_dict = {}
    with open('acronym.txt') as fileobj:
        for line in fileobj:
            key, value = line.split('\t')
            my_dict[key] = value
    acronym_words = []
    words = word_tokenize(text)
    for word in words:
        for candidate_replacement in my_dict:
            if candidate_replacement in word:
                word = word.replace(candidate_replacement, my_dict[candidate_replacement])
                acronym_words.append(word)
    acronym_sentence = " ".join(acronym_words)
    return acronym_sentence

Response for this query

但是，我已经读过使用SELECT DISTINCT是非常昂贵的。我计划在非常大的表上使用此查询，我正在寻找一种优化此查询的方法，可能不使用SELECT DISTINCT。

我正在使用PostgreSQL 10.3

Answer 1

select  *
from    radiology r
join    (
        select  entity_id
        ,       attribute_name
        from    radiology
        group by
                entity_id
        ,       attribute_name
        having  count(distinct value) > 1
        ) dupe
 on     r.entity_id = dupe.entity_id
        and r.attribute_name = dupe.attribute_name

Answer 2

这应该适合你：

select a.* from radiology a join 
(select entity, attribute, count(distinct value) cnt
from radiology 
group by entity, attribute
having count(distinct value)>1)b
on a.entity=b.entity and a.attribute=b.attribute

Answer 3

我希望选择“实体”和“属性”列具有相同值的所有行，但“值”列的值不同。

您的方法不会这样做。我想exists：

select r.*
from radiology r
where exists (select 1
              from radiology r2
              where r2.entity = r.entity and r2.attribute = r.attribute and
                    r2.value <> r.value
             );

如果您只想使用对的实体/属性值，请使用group by：

select entity, attribute
from radiology
group by entity, attribute
having min(value) <> max(value);

请注意，您可以使用having count(distinct value) > 1，但count(distinct)会产生比min()和max()更多的开销。

SQL查询以有效地选择非完美的重复项

3 个答案: