PostgreSQL - 将多行数据连接到一行

时间:2016-02-26 15:05:58

标签: sql postgresql pivot

我有一个查询,它将多行转换为单行。我想知道是否有更好的技术。为了说明我们的情况,我采用了类似于我们应用程序的简单用户车关系和模拟查询。

table: Users       PrimaryKey: UserId
---------------------------------------
| UserId | UserDetails1 | UserDetails2|
---------------------------------------
| 1      | name1        | Addr1       |
| 2      | name2        | Addr2       |
---------------------------------------

table: UserCars    Unique Constraint(UserId, CarType) 
                   index on userid, cartype
-------------------------------------------
| UserId | CarType | RedCount | BlueCount |
------------------------------------------- 
|   1    |   SUV   |    1     |    0      |
|   1    |  sedan  |    1     |    2      |
|   2    |  sedan  |    1     |    0      |
-------------------------------------------    
Consider CarType as an enum type with values SUV and sedan only 

应用程序需要为单个查询中的每个用户获取UserDetails1,sum(RedCount),sum(BlueCount),SUV的RedCount,SUV的BlueCount,sedan RedCount,sedan BlueCount。

对于上面的例子,结果应该是

--------------------------------------------------------------------------------
| UserId | UserDetails1 | TotalRed |TotalBlue|SUVRed|SUVBlue|sedanRed|sedanBlue|
--------------------------------------------------------------------------------
|  1     | name1        |   2      |    2    |   1  |   2   |   1    |    0    |
|  2     | name2        |   1      |    0    |   0  |   0   |   1    |    0    |
--------------------------------------------------------------------------------

目前,我们的查询如下

SELECT
 --User Information 
u.UserId, u.UserDetails1,
 --Total Counts by color
count_by_colour.TotalRed, count_by_colour.TotalBlue,
  -- Counts by type
COALESCE(suv.red, 0) AS SUVRed, COALESCE(suv.blue, 0) AS SUVBlue, 
COALESCE(sedan.red, 0) AS sedanRed, COALESCE(sedan.blue, 0) AS sedanBlue
FROM Users u
JOIN (
    SELECT c.UserId, SUM(RedCount) as TotalRed, 
    SUM(BlueCount) AS TotalBlue
    FROM UserCars c GROUP BY UserId
) count_by_colour
ON (u.UserId = count_by_colour.UserId)
LEFT JOIN (
    SELECT UserId, RedCount AS red, BlueCount AS blue
    FROM UserCars
    WHERE CarType = 'SUV') suv
ON (u.UserId = suv.UserId)
LEFT JOIN (
    SELECT UserId, RedCount AS red, BlueCount AS blue
    FROM UserCars
    WHERE CarType = 'sedan') sedan
ON (u.UserId = sedan.UserId)

虽然查询按预期获取数据,但我想知道是否有比这更好的技术。在这个例子中,我只提供了两种类型(SUV和轿车),但在我们与营销相关的原始应用程序中,有更多类型,这意味着更多的左连接。

注意:由于其他应用程序使用相同的

,因此无法更改表

谢谢,
拉维

3 个答案:

答案 0 :(得分:2)

您可以使用条件聚合

SELECT u.UserId, u.UserDetails1, 
       SUM(RedCount) AS TotalRed, SUM(BlueCount) AS TotalBlue,
       COALESCE(SUM(CASE WHEN CarType = 'SUV' THEN RedCount  END), 0) AS SUVRed,
       COALESCE(SUM(CASE WHEN CarType = 'SUV' THEN BlueCount  END), 0) AS SUVBlue,
       COALESCE(SUM(CASE WHEN CarType = 'sedan' THEN RedCount  END), 0) AS SedanRed,
       COALESCE(SUM(CASE WHEN CarType = 'sedan' THEN BlueCount  END), 0) AS SedanBlue
FROM Users AS u    
LEFT JOIN UserCars AS uc 
  ON u.UserId = uc.UserId
GROUP BY u.UserId, u.UserDetails1 

Demo here

答案 1 :(得分:2)

正如@Giorgos Betsos指出的那样,条件聚合可以用来避免我的初始查询中的左连接。感谢Giorgos Betsos的建议。但是不接受@Giorgos Betsos答案作为原始问题的答案的原因是,使用users表中的所有列对Users表进行分组需要更多时间。在实际情况中,将从用户表中提取更多列,因此应该避免。

我稍微修改了他的查询如下

SELECT
 --User Information 
u.UserId, u.UserDetails1,
 --Total Counts by color
temp.TotalRed, temp.TotalBlue,
  -- Counts by type
temp.SUVRed, temp.SUVBlue, 
temp.sedanRed, temp.sedanBlue
FROM Users u
JOIN (SELECT userid,
      SUM(RedCount) AS TotalRed, SUM(BlueCount) AS TotalBlue,
       COALESCE(SUM(CASE WHEN CarType = 'SUV' THEN RedCount  END), 0) AS SUVRed,
       COALESCE(SUM(CASE WHEN CarType = 'SUV' THEN BlueCount  END), 0) AS SUVBlue,
       COALESCE(SUM(CASE WHEN CarType = 'sedan' THEN RedCount  END), 0) AS SedanRed,
       COALESCE(SUM(CASE WHEN CarType = 'sedan' THEN BlueCount  END), 0) AS SedanBlue
FROM usercars GROUP BY userid) temp
ON (temp.userid = u.userid)

我针对同一数据集运行了这两个查询,查询计划如下

在Giorgos Betsos的回答

中查询
 GroupAggregate  (cost=34407.59..41848.99 rows=99999 width=25) (actual time=477.323..644.976 rows=99999 loops=1)
   ->  Sort  (cost=34407.59..34903.09 rows=198197 width=25) (actual time=477.303..513.956 rows=199974 loops=1)
         Sort Key: u.userid, u.userdetails1
         Sort Method: external merge  Disk: 7608kB
         ->  Hash Right Join  (cost=3375.98..12227.15 rows=198197 width=25) (actual time=83.339..265.419 rows=199974 loops=1)
               Hash Cond: (uc.userid = u.userid)
               ->  Seq Scan on usercars uc  (cost=0.00..3176.51 rows=199951 width=16) (actual time=0.009..48.687 rows=199951 loops=1)
               ->  Hash  (cost=1636.99..1636.99 rows=99999 width=13) (actual time=83.137..83.137 rows=99999 loops=1)
                     Buckets: 4096  Batches: 8  Memory Usage: 570kB
                     ->  Seq Scan on users u  (cost=0.00..1636.99 rows=99999 width=13) (actual time=0.009..34.343 rows=99999 loops=1)
 Total runtime: 649.600 ms

对于此评论中给出的修改后的查询

Hash Join  (cost=3376.40..23359.86 rows=100884 width=61) (actual time=87.938..392.103 rows=99976 loops=1)
   Hash Cond: (temp.userid = u.userid)
   ->  Subquery Scan on temp  (cost=0.42..15883.52 rows=100884 width=52) (actual time=0.064..231.107 rows=99976 loops=1)
         ->  GroupAggregate  (cost=0.42..14874.68 rows=100884 width=16) (actual time=0.063..216.605 rows=99976 loops=1)
               ->  Index Scan using user_cartype on usercars  (cost=0.42..8367.18 rows=199951 width=16) (actual time=0.036..44.917 rows=199951 loops=1)
   ->  Hash  (cost=1636.99..1636.99 rows=99999 width=13) (actual time=87.635..87.635 rows=99999 loops=1)
         Buckets: 4096  Batches: 8  Memory Usage: 570kB
         ->  Seq Scan on users u  (cost=0.00..1636.99 rows=99999 width=13) (actual time=0.008..36.204 rows=99999 loops=1)
 Total runtime: 395.397 ms

再次感谢Giorgos Betsos的建议。

谢谢,
拉维

答案 2 :(得分:1)

  SELECT
--User Information 
u.UserId, u.UserDetails1,
 --Total Counts by color
temp.TotalRed, temp.TotalBlue,
  -- Counts by type
temp.SUVRed, temp.SUVBlue, 
temp.sedanRed, temp.sedanBlue
FROM Users u
(SELECT userid,
      SUM(RedCount) AS TotalRed, SUM(BlueCount) AS TotalBlue,
       COALESCE(SUM(CASE WHEN CarType = 'SUV' THEN RedCount  END), 0) AS SUVRed,
       COALESCE(SUM(CASE WHEN CarType = 'SUV' THEN BlueCount  END), 0) AS SUVBlue,
       COALESCE(SUM(CASE WHEN CarType = 'sedan' THEN RedCount  END), 0) AS SedanRed,
       COALESCE(SUM(CASE WHEN CarType = 'sedan' THEN BlueCount  END), 0) AS SedanBlue
FROM usercars GROUP BY userid) temp
ON (temp.userid = u.userid)