Question

我有一个包含60列的表company。目标是创建一个工具来查找，比较和消除此表中的重复项。

示例：我找到了两家可能相同的公司，但我需要知道这两行中哪些值（列）不同才能继续。

我认为可以逐列x 60进行比较，但我会搜索更简单，更通用的解决方案。

类似的东西：

SELECT * FROM company where co_id=22
SHOW DIFFERENCE
SELECT * FROM company where co_id=33

结果应该是不同的列名。

Answer 1

您可以使用hstore扩展程序。当需要迭代列时，它通常很方便。

诀窍是将每行的内容转换为column_name=>value对到hstore值，然后使用hstore函数计算差异。

演示：

CREATE TABLE table1 (id int primary key, t1 text, t2 text, t3 text);

让我们插入两行不同的主键和另一列（t3）。

INSERT INTO table1 VALUES (
 (1,'foo','bar','baz'),
 (2,'foo','bar','biz')
);

查询：

SELECT skeys(h1-h2) from 
  (select hstore(t.*) as h1 from table1 t where id=1) h1
 CROSS JOIN
  (select hstore(t.*) as h2 from table1 t where id=2) h2;

h1-h2按键计算差异键，skeys()将结果输出为一组。

结果：

 skeys 
-------
 id
 t3

可以使用skeys((h1-h2)-'id')优化选择列表，以便始终删除id，作为主键，显然在行之间总是不同。

Answer 2

这是一个存储过程，可以帮助你完成任务...

虽然这应该可以正常工作＆＃34;＆＃34;但它没有错误检查，你应该添加。

它获取表中的所有列，并循环遍历它们。不同之处在于不同项目的计数超过一个。此外，输出是：

差异数量
存在差异的每列的消息

返回具有差异的列的行集可能更有用。祝你好运！

用法：

SELECT showdifference('public','company','co_id',22,33)


CREATE OR REPLACE FUNCTION showdifference(p_schema text, p_tablename text,p_idcolumn text,p_firstid integer, p_secondid integer)
  RETURNS INTEGER AS
$BODY$ 
DECLARE
    l_diffcount INTEGER;
    l_column text;
    l_dupcount integer;
    column_cursor CURSOR FOR select column_name from information_schema.columns where table_name = p_tablename and table_schema = p_schema and column_name <> p_idcolumn;
BEGIN


    -- need error checking here, to ensure the table and schema exist and the columns exist

    -- Should also check that the records ids exist.

    -- Should also check that the column type of the id field is integer


    -- Set the number of differences to zero.

    l_diffcount := 0;

    -- use a cursor to iterate over the columns found in information_schema.columns
    -- open the cursor

    OPEN column_cursor;

    LOOP
        FETCH column_cursor INTO l_column;
        EXIT WHEN NOT FOUND;

        -- build a query to see if there is a difference between the columns. If there is raise a notice
        EXECUTE 'select count(distinct  ' || quote_ident(l_column) || ' ) from ' || quote_ident(p_schema) || '.' || quote_ident(p_tablename) || ' where ' || quote_ident(p_idcolumn) || ' in ('|| p_firstid || ',' || p_secondid ||')'
        INTO l_dupcount;



        IF l_dupcount > 1 THEN
        -- increment the counter
        l_diffcount := l_diffcount +1;
        RAISE NOTICE  '% has % differences', l_column, l_dupcount ; -- for "real" you might want to return a rowset and could do something here

        END IF;


    END LOOP;




    -- close the cursor
    CLOSE column_cursor;


    RETURN l_diffcount;
END;
$BODY$
  LANGUAGE plpgsql VOLATILE STRICT
  COST 100;

获取2行之间不同的列

2 个答案: