Question

我有一个包含两列的数据库：

author_id, message

条目如：

123, "message!"
123, "message!"
123, "different message"
124, "message!"

我想做一个允许我选择的查询：

123, "message!"

或

124, "message!"

基本上，message相同，但author_id不同的条目。

然后我要删除其中一个条目。（哪一个没关系，只是我只能选择其中一个）。

This question让我接近，但这是两列中的重复。

Answer 1

还有一个替代例子：

-- Test table
CREATE TABLE dummy_data (
    author_id   int,
    message     text
);

-- Test data
INSERT INTO dummy_data ( author_id, message )
VALUES
( 123, '"message!"' ),
( 123, '"message!"' ),
( 123, '"different message"' ),
( 124, '"message!"' ),
( 124, '"message!"' ),
( 125, '"message!"' );

-- Delete query
DELETE FROM dummy_data
WHERE   ctid NOT IN (
            SELECT  max( ctid )
            FROM    dummy_data
            GROUP BY message     -- this is important to specify
        )
 -- just for test returning deleted records,
 -- you may ignore it, if don't want
RETURNING *;

-- Confirming result:
SELECT * FROM dummy_data ;
 author_id |       message
-----------+---------------------
       123 | "different message"
       125 | "message!"
(2 rows)

详细了解系统列：https://www.postgresql.org/docs/current/static/ddl-system-columns.html

编辑：
请求通过ID限制范围的附加示例（author_id）。

纯查询：

DELETE FROM dummy_data
USING   ( SELECT ARRAY[ 123, 124] ) v(id)
WHERE   author_id = ANY ( v.id )
AND     ctid NOT IN (
            SELECT  max( ctid )
            FROM    dummy_data
            WHERE   author_id = ANY ( v.id )
            GROUP BY message
        );

与评论相同的查询：

DELETE FROM dummy_data
-- Add your 'author_id' values into array here.
-- Reason we list it here with USING statement is
-- because we need to compare values in two places
-- and if list is too big it would be annoyance to
-- write it 2 times :)
USING   ( SELECT ARRAY[ 123, 124] ) v(id)
-- First we get all the authors in the batch by ID
WHERE   author_id = ANY ( v.id )
-- Secondly we get max CTID to ignore using same
-- authors range in batch scope
AND     ctid NOT IN (
            SELECT  max( ctid )
            FROM    dummy_data
            WHERE   author_id = ANY ( v.id )
            GROUP BY message
        );

-- This will delete following rows:
 author_id |  message
-----------+------------
       123 | "message!"
       123 | "message!"
       124 | "message!"
(3 rows)

-- Leaving the state to table:
 author_id |       message
-----------+---------------------
       123 | "different message"
       124 | "message!"
       125 | "message!"
(3 rows)

Answer 2

您可以使用array_agg()，例如：

select author_id, message
from (
    select message, array_agg(distinct author_id) ids
    from my_table
    group by message
    ) s
cross join unnest(ids) author_id
where cardinality(ids) > 1
order by author_id;

 author_id | message  
-----------+----------
       123 | message!
       124 | message!
(2 rows)

如果要为乘法消息获取单行，则查询可能更简单：

select min(author_id) as author_id, message
from my_table
group by message
having count(distinct author_id) > 1;

 author_id | message  
-----------+----------
       123 | message!
(1 row)

Answer 3

如果我正确理解，你需要这样的东西：

with the_table (author_id, message) as (
    select 123, '"message!"' union all
    select 123, '"message!"' union all
    select 123, '"aaa!"' union all
    select 123, '"different message"' union all
    select 124, '"aaa!"' union all
    select 124, '"message!"'  union all
    select 125, '"aaa!"' union all
    select 125, '"rrrr!"'  
)


select the_table.* from  the_table 
join ( 
    select message from the_table
    group by message
    having count(distinct author_id) = (select count(distinct author_id) from the_table)
) t
on the_table.message = t.message
order by random() limit 1

随机获取一位有消息的用户，这对所有author_id来说都是常见的

Postgres在一列上选择重复但在另一列

3 个答案: