我有一个查询,它将多行转换为单行。我想知道是否有更好的技术。为了说明我们的情况,我采用了类似于我们应用程序的简单用户车关系和模拟查询。
table: Users PrimaryKey: UserId
---------------------------------------
| UserId | UserDetails1 | UserDetails2|
---------------------------------------
| 1 | name1 | Addr1 |
| 2 | name2 | Addr2 |
---------------------------------------
table: UserCars Unique Constraint(UserId, CarType)
index on userid, cartype
-------------------------------------------
| UserId | CarType | RedCount | BlueCount |
-------------------------------------------
| 1 | SUV | 1 | 0 |
| 1 | sedan | 1 | 2 |
| 2 | sedan | 1 | 0 |
-------------------------------------------
Consider CarType as an enum type with values SUV and sedan only
应用程序需要为单个查询中的每个用户获取UserDetails1,sum(RedCount),sum(BlueCount),SUV的RedCount,SUV的BlueCount,sedan RedCount,sedan BlueCount。
对于上面的例子,结果应该是
--------------------------------------------------------------------------------
| UserId | UserDetails1 | TotalRed |TotalBlue|SUVRed|SUVBlue|sedanRed|sedanBlue|
--------------------------------------------------------------------------------
| 1 | name1 | 2 | 2 | 1 | 2 | 1 | 0 |
| 2 | name2 | 1 | 0 | 0 | 0 | 1 | 0 |
--------------------------------------------------------------------------------
目前,我们的查询如下
SELECT
--User Information
u.UserId, u.UserDetails1,
--Total Counts by color
count_by_colour.TotalRed, count_by_colour.TotalBlue,
-- Counts by type
COALESCE(suv.red, 0) AS SUVRed, COALESCE(suv.blue, 0) AS SUVBlue,
COALESCE(sedan.red, 0) AS sedanRed, COALESCE(sedan.blue, 0) AS sedanBlue
FROM Users u
JOIN (
SELECT c.UserId, SUM(RedCount) as TotalRed,
SUM(BlueCount) AS TotalBlue
FROM UserCars c GROUP BY UserId
) count_by_colour
ON (u.UserId = count_by_colour.UserId)
LEFT JOIN (
SELECT UserId, RedCount AS red, BlueCount AS blue
FROM UserCars
WHERE CarType = 'SUV') suv
ON (u.UserId = suv.UserId)
LEFT JOIN (
SELECT UserId, RedCount AS red, BlueCount AS blue
FROM UserCars
WHERE CarType = 'sedan') sedan
ON (u.UserId = sedan.UserId)
虽然查询按预期获取数据,但我想知道是否有比这更好的技术。在这个例子中,我只提供了两种类型(SUV和轿车),但在我们与营销相关的原始应用程序中,有更多类型,这意味着更多的左连接。
注意:由于其他应用程序使用相同的
,因此无法更改表谢谢,
拉维
答案 0 :(得分:2)
您可以使用条件聚合:
SELECT u.UserId, u.UserDetails1,
SUM(RedCount) AS TotalRed, SUM(BlueCount) AS TotalBlue,
COALESCE(SUM(CASE WHEN CarType = 'SUV' THEN RedCount END), 0) AS SUVRed,
COALESCE(SUM(CASE WHEN CarType = 'SUV' THEN BlueCount END), 0) AS SUVBlue,
COALESCE(SUM(CASE WHEN CarType = 'sedan' THEN RedCount END), 0) AS SedanRed,
COALESCE(SUM(CASE WHEN CarType = 'sedan' THEN BlueCount END), 0) AS SedanBlue
FROM Users AS u
LEFT JOIN UserCars AS uc
ON u.UserId = uc.UserId
GROUP BY u.UserId, u.UserDetails1
答案 1 :(得分:2)
正如@Giorgos Betsos指出的那样,条件聚合可以用来避免我的初始查询中的左连接。感谢Giorgos Betsos的建议。但是不接受@Giorgos Betsos答案作为原始问题的答案的原因是,使用users表中的所有列对Users表进行分组需要更多时间。在实际情况中,将从用户表中提取更多列,因此应该避免。
我稍微修改了他的查询如下
SELECT
--User Information
u.UserId, u.UserDetails1,
--Total Counts by color
temp.TotalRed, temp.TotalBlue,
-- Counts by type
temp.SUVRed, temp.SUVBlue,
temp.sedanRed, temp.sedanBlue
FROM Users u
JOIN (SELECT userid,
SUM(RedCount) AS TotalRed, SUM(BlueCount) AS TotalBlue,
COALESCE(SUM(CASE WHEN CarType = 'SUV' THEN RedCount END), 0) AS SUVRed,
COALESCE(SUM(CASE WHEN CarType = 'SUV' THEN BlueCount END), 0) AS SUVBlue,
COALESCE(SUM(CASE WHEN CarType = 'sedan' THEN RedCount END), 0) AS SedanRed,
COALESCE(SUM(CASE WHEN CarType = 'sedan' THEN BlueCount END), 0) AS SedanBlue
FROM usercars GROUP BY userid) temp
ON (temp.userid = u.userid)
我针对同一数据集运行了这两个查询,查询计划如下
在Giorgos Betsos的回答
中查询 GroupAggregate (cost=34407.59..41848.99 rows=99999 width=25) (actual time=477.323..644.976 rows=99999 loops=1)
-> Sort (cost=34407.59..34903.09 rows=198197 width=25) (actual time=477.303..513.956 rows=199974 loops=1)
Sort Key: u.userid, u.userdetails1
Sort Method: external merge Disk: 7608kB
-> Hash Right Join (cost=3375.98..12227.15 rows=198197 width=25) (actual time=83.339..265.419 rows=199974 loops=1)
Hash Cond: (uc.userid = u.userid)
-> Seq Scan on usercars uc (cost=0.00..3176.51 rows=199951 width=16) (actual time=0.009..48.687 rows=199951 loops=1)
-> Hash (cost=1636.99..1636.99 rows=99999 width=13) (actual time=83.137..83.137 rows=99999 loops=1)
Buckets: 4096 Batches: 8 Memory Usage: 570kB
-> Seq Scan on users u (cost=0.00..1636.99 rows=99999 width=13) (actual time=0.009..34.343 rows=99999 loops=1)
Total runtime: 649.600 ms
对于此评论中给出的修改后的查询
Hash Join (cost=3376.40..23359.86 rows=100884 width=61) (actual time=87.938..392.103 rows=99976 loops=1)
Hash Cond: (temp.userid = u.userid)
-> Subquery Scan on temp (cost=0.42..15883.52 rows=100884 width=52) (actual time=0.064..231.107 rows=99976 loops=1)
-> GroupAggregate (cost=0.42..14874.68 rows=100884 width=16) (actual time=0.063..216.605 rows=99976 loops=1)
-> Index Scan using user_cartype on usercars (cost=0.42..8367.18 rows=199951 width=16) (actual time=0.036..44.917 rows=199951 loops=1)
-> Hash (cost=1636.99..1636.99 rows=99999 width=13) (actual time=87.635..87.635 rows=99999 loops=1)
Buckets: 4096 Batches: 8 Memory Usage: 570kB
-> Seq Scan on users u (cost=0.00..1636.99 rows=99999 width=13) (actual time=0.008..36.204 rows=99999 loops=1)
Total runtime: 395.397 ms
再次感谢Giorgos Betsos的建议。
谢谢,
拉维
答案 2 :(得分:1)
SELECT
--User Information
u.UserId, u.UserDetails1,
--Total Counts by color
temp.TotalRed, temp.TotalBlue,
-- Counts by type
temp.SUVRed, temp.SUVBlue,
temp.sedanRed, temp.sedanBlue
FROM Users u
(SELECT userid,
SUM(RedCount) AS TotalRed, SUM(BlueCount) AS TotalBlue,
COALESCE(SUM(CASE WHEN CarType = 'SUV' THEN RedCount END), 0) AS SUVRed,
COALESCE(SUM(CASE WHEN CarType = 'SUV' THEN BlueCount END), 0) AS SUVBlue,
COALESCE(SUM(CASE WHEN CarType = 'sedan' THEN RedCount END), 0) AS SedanRed,
COALESCE(SUM(CASE WHEN CarType = 'sedan' THEN BlueCount END), 0) AS SedanBlue
FROM usercars GROUP BY userid) temp
ON (temp.userid = u.userid)