整合,合并,合并行

时间:2016-01-16 18:41:28

标签: sql postgresql join duplicates full-outer-join

我所做的每一次搜索都会让我找到搜索array_agg将行中的多列组合成列的人的结果。这不是我想在这里弄清楚的,也许我没有使用正确的搜索条件(例如,合并,合并,合并)。

我试图通过填充字段中的值来组合行...我不确定除了示例之外的最佳描述方式:

Current:
--------------------------------
 id  num_1  num_2  num_3  num_4 
--------------------------------
 1    111    222     0      0   
 2    111    333     0      0   
 3    111     0      0     444  
 4     0     222    555     0   
 5    777    999     0      0   
 6     0     999    888     0   

After Processing:
--------------------------------
 id  num_1  num_2  num_3  num_4 
--------------------------------
 1    111    222    555    444  
 2    111    333    555    444  
 3    111    333    555    444  
 4    111    222    555    444  
 5    777    999    888     0   
 6    777    999    888     0   


After Deleting Duplicate Rows:
--------------------------------
 id  num_1  num_2  num_3  num_4 
--------------------------------
 1    111    222    555    444  
 2    111    333    555    444  
 3    777    999    888     0   

这可能是一个两步过程...首先填写空白,然后查找/删除副本。我可以执行第二步,但无法确定如何首先使用另一行中的值填充0值,其中您可能有两个不同的值(num_2列的id为1/2)但{{1}列只有一个值1}}(例如,111)

我可以用PHP做,但想知道如何只使用Postgres来做。

编辑:我的示例表是一个关系表。我有多个具有相似信息的数据集(例如,用户名)但注册ID号不同。所以,我在表1和表2(例如)上进行内连接,其中用户名是相同的。然后我从每个表中获取注册ID(不同),并将其作为一行插入到我的关系表中。在上面的示例表中,第1行从我加入的两个表中有两个不同的注册ID ...值111(num_1)和222(num_1)被插入到表中,并为{{{{{{ 1}}和num_2。然后我比较表1和表4,将值111(num_3)和444(num_4)插入到关系表中,并为num_1num_4添加零。由于注册ID 111与注册ID 222相关并且注册ID 111与注册ID 444相关,因此注册ID 111,222和444都是相关的(意味着对于那些注册ID中的每一个,用户名是相同的)。这有助于澄清吗?

编辑2:我更正了表2和3.希望现在有意义了。用户名列不是唯一的。所以,我有4个这样的表:

表1:

num_2

表2:

num_3

表3:

bob  - 111
mary - 777

表4:

bob  - 222
bob  - 333
mary - 999

因此,在我的关系表中,我最终应该得到3行,如上面的示例表3所示。

2 个答案:

答案 0 :(得分:0)

如果您的值始终在增加(如示例所示),那么只需使用累积最大值,然后使用select distinct

select row_number() over (order by min(id)) as id,
       t.num1, t.num2, t.num3, t.num4
from (select id,
             max(num1) over (order by id) as num1,
             max(num2) over (order by id) as num2,
             max(num3) over (order by id) as num3,
             max(num4) over (order by id) as num4
      from t
     ) t
group by t.num1, t.num2, t.num3, t.num4;

如果max()不起作用,那么您真正想要的是lag( . . . ignore nulls)。那还没有。也许最简单的方法是每列的相关子查询:

select row_number() over (order by min(id)) as id,
       t.num1, t.num2, t.num3, t.num4
from (select id,
             (select t2.num1 from t t2 where t2.id <= t.id and t2.num1 <> 0 order by t2.id desc limit 1
             ) as num1,
             (select t2.num2 from t t2 where t2.id <= t.id and t2.num2 <> 0 order by t2.id desc limit 1
             ) as num2,
             (select t2.num3 from t t2 where t2.id <= t.id and t2.num3 <> 0 order by t2.id desc limit 1
             ) as num3,
             (select t2.num4 from t t2 where t2.id <= t.id and t2.num4 <> 0 order by t2.id desc limit 1
             ) as num4
      from t
     ) t
group by t.num1, t.num2, t.num3, t.num4;

即使是中等大小的桌子,这个版本的效率也不会很高。

更高效的版本更复杂:

select row_number() over (order by id) as id,
       t1.num1, t2.num2, t3.num3, t4.num4
from (select min(id) as id,
      from (select id,
                   max(case when num1 > 0 then id end) over (order by id) as num1_id,
                   max(case when num2 > 0 then id end) over (order by id) as num2_id,
                   max(case when num3 > 0 then id end) over (order by id) as num3_id,
                   max(case when num4 > 0 then id end) over (order by id) as num4_id
            from t
           ) t
      group by num1_id, num2_id, num3_id, num4_id
     ) t left join
     t t1
     on t1.id = t.num1_id left join
     t t2
     on t2.id = t.num2_id left join
     t t3
     on t3.id = t.num3_id left join
     t t4
     on t4.id = t.num4_id left join        
group by t.num1, t.num2, t.num3, t.num4;

编辑:

那有点傻。使用first_value()(不幸的是Postgres不支持聚合函数)有一种更简单的方法:

select row_number() over (order by min(id)) as id,
       num1, num2, num3, num4
from (select id,
             first_value(num1) over (order by (case when num1 is not null then id en) nulls last
                                    ) as num1,
             first_value(num2) over (order by (case when num2 is not null then id end) nulls last
                                    ) as num2,
             first_value(num3) over (order by (case when num3 is not null then id end) nulls last
                                    ) as num3,
             first_value(num4) over (order by (case when num4 is not null then id end) nulls last
                                    ) as num4
      from t
     ) t
group by num1, num2, num3, num4;

答案 1 :(得分:0)

似乎你是在一个假定的解决方案中间开始的,忘记提出最初的问题。根据您添加的信息,我建议一个完全不同的,更简单的解决方案。你有:

CREATE TABLE table1 (username text, registration_id int);
CREATE TABLE table2 (LIKE table1);
CREATE TABLE table3 (LIKE table1);
CREATE TABLE table4 (LIKE table1);

INSERT INTO table1 VALUES ('bob', 111), ('mary', 777);
INSERT INTO table2 VALUES ('bob', 222), ('bob', 333), ('mary', 999);
INSERT INTO table3 VALUES ('bob', 555), ('mary', 888);
INSERT INTO table4 VALUES ('bob', 444); -- no mary

解决方案

有关FROMJOIN的手册中的详细信息。

FULL [OUTER] JOIN

这就是全部。直接产生您想要的结果。

-- CREATE TABLE relations AS
SELECT username
     , t1.registration_id AS reg1
     , t2.registration_id AS reg2
     , t3.registration_id AS reg3
     , t4.registration_id AS reg4
FROM   table1     t1
FULL   JOIN table2 t2 USING (username)
FULL   JOIN table3 t3 USING (username)
FULL   JOIN table4 t4 USING (username)
ORDER  BY username;

您的示例也适用于username reg1 reg2 reg3 reg4 --------------------------------- bob 111 222 555 444 bob 111 333 555 444 mary 777 999 888 (null) ,因为所有缺失的条目都在右侧。但这在其他星座中会失败。我在小提琴中添加了一些更具启发性的测试用例

SQL Fiddle.

我假设你知道多个表中的多个条目会产生大量的输出行: