删除重复行错误:重复键值

时间:2015-01-23 14:02:37

标签: sql postgresql duplicate-removal

我想从PostgreSQL 9.1数据库中的表measurement中删除重复的行。

一些表格信息:

select column_name, data_type from information_schema.columns where table_name = 'measurement';

 column_name | data_type 
-------------+-----------
 s_sum       | real
 s_l3        | real
 s_l2        | real
 s_l1        | real
 q_sum       | real
 q_l3        | real
 q_l2        | real
 q_l1        | real
 p_sum       | real
 p_l3        | real
 p_l2        | real
 p_l1        | real
 irms_n      | real
 irms_l3     | real
 irms_l2     | real
 irms_l1     | real
 urms_l3     | real
 urms_l2     | real
 urms_l1     | real
 timestamp   | integer
 site        | integer
 id          | integer
(22 rows)

select count(*) from measurement;

  count   
----------
 56265678
(1 row)

所以我想要删除除id之外的所有列相等的重复行。我继续尝试使用this answer中的方法。

SET temp_buffers = '1GB';

BEGIN;

CREATE TEMPORARY TABLE t_tmp AS
SELECT DISTINCT site,
            timestamp,
            urms_l1,
            urms_l2,
            urms_l3,
            irms_l1,
            irms_l2,
            irms_l3,
            irms_n,
            p_l1,
            p_l2,
            p_l3,
            p_sum,
            q_l1,
            q_l2,
            q_l3,
            q_sum,
            s_l1,
            s_l2,
            s_l3,
            s_sum
FROM measurement;

TRUNCATE measurement;

INSERT INTO measurement 
SELECT * FROM t_tmp;

COMMIT;

回声/错误是:

SET
BEGIN
SELECT 56103537
TRUNCATE TABLE
ERROR:  duplicate key value violates unique constraint "measurement_pkey"
DETAIL:  Key (id)=(1) already exists.
ROLLBACK

因此它看起来好像会删除重复项(与上面的原始表measurement的行数相比)但是然后违反了主键约束。我真的不知道这里发生了什么,我假设INSERT没有在截断的表格上运行......

更新

请求的sql架构如下:

--
-- PostgreSQL database dump
--

SET statement_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = on;
SET check_function_bodies = false;
SET client_min_messages = warning;

--
-- Name: plpgsql; Type: EXTENSION; Schema: -; Owner: -
--

CREATE EXTENSION IF NOT EXISTS plpgsql WITH SCHEMA pg_catalog;


--
-- Name: EXTENSION plpgsql; Type: COMMENT; Schema: -; Owner: -
--

COMMENT ON EXTENSION plpgsql IS 'PL/pgSQL procedural language';


SET search_path = public, pg_catalog;

SET default_tablespace = '';

SET default_with_oids = false;

--
-- Name: measurement; Type: TABLE; Schema: public; Owner: -; Tablespace: 
--

CREATE TABLE measurement (
    id integer NOT NULL,
    site integer,
    "timestamp" integer,
    urms_l1 real,
    urms_l2 real,
    urms_l3 real,
    irms_l1 real,
    irms_l2 real,
    irms_l3 real,
    irms_n real,
    p_l1 real,
    p_l2 real,
    p_l3 real,
    p_sum real,
    q_l1 real,
    q_l2 real,
    q_l3 real,
    q_sum real,
    s_l1 real,
    s_l2 real,
    s_l3 real,
    s_sum real
);


--
-- Name: measurement_pkey; Type: CONSTRAINT; Schema: public; Owner: -; Tablespace: 
--

ALTER TABLE ONLY measurement
    ADD CONSTRAINT measurement_pkey PRIMARY KEY (id);


--
-- Name: public; Type: ACL; Schema: -; Owner: -
--

REVOKE ALL ON SCHEMA public FROM PUBLIC;
REVOKE ALL ON SCHEMA public FROM postgres;
GRANT ALL ON SCHEMA public TO postgres;
GRANT ALL ON SCHEMA public TO PUBLIC;


--
-- PostgreSQL database dump complete
--

然后

SELECT id
FROM measurement
GROUP BY id
HAVING COUNT(*) > 1;

产量

 id 
----
(0 rows)

1 个答案:

答案 0 :(得分:1)

主键是measurement表中字段子集的唯一约束,而SELECT DISTINCT只返回列出的字段中的唯一记录,但查看每条记录中的每个字段,而不仅仅是主键

也就是说,您的记录具有相同的主键(显然为id),但在非键字段中具有不同的值。

您可以通过运行以下命令找到具有重复ID的键:

SELECT id
FROM t_tmp
GROUP BY id
HAVING COUNT(*) > 1;

您可以通过以下方式显示与此相关的记录:

SELECT *
FROM t_tmp
WHERE id IN (
    SELECT id
    FROM t_tmp
    GROUP BY id
    HAVING COUNT(*) > 1
);

[请注意,我在上面指定了t_tmp,但如果您还没有实际运行TRUNCATE TABLE measurement;,那么您可以使用measurement代替。{/ p>

这些是具有重复ID的记录,这些记录会导致您的密钥违规,假设密钥只在id上,看起来它来自错误消息。您需要决定要保留哪个以及删除哪个,或者考虑将id字段更新为新的唯一值。

不清楚id是否与某个序列相关联,或者是否在新表中创建为SERIALBIGSERIAL。您应该从pgAdmin生成一个CREATE TABLE脚本,以便为我们提供完整的架构。它也不清楚表格上是否还有其他独特限制。