我想从PostgreSQL 9.1数据库中的表measurement
中删除重复的行。
一些表格信息:
select column_name, data_type from information_schema.columns where table_name = 'measurement';
column_name | data_type
-------------+-----------
s_sum | real
s_l3 | real
s_l2 | real
s_l1 | real
q_sum | real
q_l3 | real
q_l2 | real
q_l1 | real
p_sum | real
p_l3 | real
p_l2 | real
p_l1 | real
irms_n | real
irms_l3 | real
irms_l2 | real
irms_l1 | real
urms_l3 | real
urms_l2 | real
urms_l1 | real
timestamp | integer
site | integer
id | integer
(22 rows)
和
select count(*) from measurement;
count
----------
56265678
(1 row)
所以我想要删除除id
之外的所有列相等的重复行。我继续尝试使用this answer中的方法。
SET temp_buffers = '1GB';
BEGIN;
CREATE TEMPORARY TABLE t_tmp AS
SELECT DISTINCT site,
timestamp,
urms_l1,
urms_l2,
urms_l3,
irms_l1,
irms_l2,
irms_l3,
irms_n,
p_l1,
p_l2,
p_l3,
p_sum,
q_l1,
q_l2,
q_l3,
q_sum,
s_l1,
s_l2,
s_l3,
s_sum
FROM measurement;
TRUNCATE measurement;
INSERT INTO measurement
SELECT * FROM t_tmp;
COMMIT;
回声/错误是:
SET
BEGIN
SELECT 56103537
TRUNCATE TABLE
ERROR: duplicate key value violates unique constraint "measurement_pkey"
DETAIL: Key (id)=(1) already exists.
ROLLBACK
因此它看起来好像会删除重复项(与上面的原始表measurement
的行数相比)但是然后违反了主键约束。我真的不知道这里发生了什么,我假设INSERT
没有在截断的表格上运行......
更新
请求的sql架构如下:
--
-- PostgreSQL database dump
--
SET statement_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = on;
SET check_function_bodies = false;
SET client_min_messages = warning;
--
-- Name: plpgsql; Type: EXTENSION; Schema: -; Owner: -
--
CREATE EXTENSION IF NOT EXISTS plpgsql WITH SCHEMA pg_catalog;
--
-- Name: EXTENSION plpgsql; Type: COMMENT; Schema: -; Owner: -
--
COMMENT ON EXTENSION plpgsql IS 'PL/pgSQL procedural language';
SET search_path = public, pg_catalog;
SET default_tablespace = '';
SET default_with_oids = false;
--
-- Name: measurement; Type: TABLE; Schema: public; Owner: -; Tablespace:
--
CREATE TABLE measurement (
id integer NOT NULL,
site integer,
"timestamp" integer,
urms_l1 real,
urms_l2 real,
urms_l3 real,
irms_l1 real,
irms_l2 real,
irms_l3 real,
irms_n real,
p_l1 real,
p_l2 real,
p_l3 real,
p_sum real,
q_l1 real,
q_l2 real,
q_l3 real,
q_sum real,
s_l1 real,
s_l2 real,
s_l3 real,
s_sum real
);
--
-- Name: measurement_pkey; Type: CONSTRAINT; Schema: public; Owner: -; Tablespace:
--
ALTER TABLE ONLY measurement
ADD CONSTRAINT measurement_pkey PRIMARY KEY (id);
--
-- Name: public; Type: ACL; Schema: -; Owner: -
--
REVOKE ALL ON SCHEMA public FROM PUBLIC;
REVOKE ALL ON SCHEMA public FROM postgres;
GRANT ALL ON SCHEMA public TO postgres;
GRANT ALL ON SCHEMA public TO PUBLIC;
--
-- PostgreSQL database dump complete
--
然后
SELECT id
FROM measurement
GROUP BY id
HAVING COUNT(*) > 1;
产量
id
----
(0 rows)
答案 0 :(得分:1)
主键是measurement
表中字段子集的唯一约束,而SELECT DISTINCT
只返回列出的字段中的唯一记录,但查看每条记录中的每个字段,而不仅仅是主键
也就是说,您的记录具有相同的主键(显然为id
),但在非键字段中具有不同的值。
您可以通过运行以下命令找到具有重复ID的键:
SELECT id
FROM t_tmp
GROUP BY id
HAVING COUNT(*) > 1;
您可以通过以下方式显示与此相关的记录:
SELECT *
FROM t_tmp
WHERE id IN (
SELECT id
FROM t_tmp
GROUP BY id
HAVING COUNT(*) > 1
);
[请注意,我在上面指定了t_tmp
,但如果您还没有实际运行TRUNCATE TABLE measurement;
,那么您可以使用measurement
代替。{/ p>
这些是具有重复ID的记录,这些记录会导致您的密钥违规,假设密钥只在id
上,看起来它来自错误消息。您需要决定要保留哪个以及删除哪个,或者考虑将id
字段更新为新的唯一值。
不清楚id
是否与某个序列相关联,或者是否在新表中创建为SERIAL
或BIGSERIAL
。您应该从pgAdmin生成一个CREATE TABLE
脚本,以便为我们提供完整的架构。它也不清楚表格上是否还有其他独特限制。