规范化SQL查询中的数据

时间:2014-08-27 13:01:28

标签: mysql sql normalization

我有一个SQL查询A(详见下文),它返回一个表如下:

cluster  brand  amount
0         bos     600
0         phi     300
0         har     100
1         pro    2500
1         wal    1500
1         ash    1000
2         dil    4200
2         sor     500
2         van     300
...

但是,我想要显示的不是金额,而是显示该金额与该群集中总金额的比例,如下表所示:

cluster  brand  amount
0         bos    0.60
0         phi    0.30
0         har    0.10
1         pro    0.50
1         wal    0.30
1         ash    0.20
2         dil    0.84
2         sor    0.10
2         van    0.06
...

我应该如何更改我的SQL,以便我可以访问一个群集中所有金额的总和,并且仍然有多个具有相同群集的行?

**详情**

SQL服务器:MySQL,通过python-MySQL连接器接口。

生成第一个表的当前SQL查询:

SELECT c.cluster, brand, COUNT(o.id) AS brand_amount
FROM nyon_all.clustering AS c 
LEFT JOIN nyon_all.persons AS p ON c.pid = p.id 
LEFT JOIN nyon_all.orders AS o ON p.id = o.pid 
LEFT JOIN nyon_all.articles AS a ON o.aid = a.id 
LEFT JOIN nyon_all.brands AS ab ON a.brand_id = ab.id 
WHERE c.cluster_round = 'Org_2014-08-27_10:45:35'
GROUP BY cluster, brand 
HAVING brand_amount > 100
ORDER BY c.cluster ASC, brand_amount DESC;

orders(主键id)将persons(外键pid)与articles(外键aid)相关联。 Articles具有特定品牌(外键brand_id),与表brands中的名称相关。

可以使用以下SQL查询检索每个群集的文章总数:

SELECT c.cluster, COUNT(o.pid) AS amount
FROM nyon_all.clustering AS c 
LEFT JOIN nyon_all.persons AS p ON c.pid = p.id 
LEFT JOIN nyon_all.orders AS o ON p.id = o.pid 
WHERE c.cluster_round = 'Org_2014-08-27_10:45:35'
GROUP BY cluster
ORDER BY c.cluster ASC, amount DESC;

结果:

cluster amount
0        1000
1        5000
2        5000

但是,我似乎无法将两个SQL查询结合起来。

1 个答案:

答案 0 :(得分:2)

您可以对按群集汇总金额的子查询进行连接

select t1.cluster, amount / sumAmount 
from Table1 t1
join (select cluster, sum(amount) as sumAmount
      from Table1
      group by cluster)s
on t1.cluster = s.cluster

请参阅SqlFiddle

修改

SELECT 
    c.cluster, 
    brand, 
    COUNT(o.id) / coalesce(s.sumBrandAmount, 0) AS brand_amount -- of course it would be nice to check for dividing by 0
FROM nyon_all.clustering AS c 
LEFT JOIN nyon_all.persons AS p ON c.pid = p.id 
LEFT JOIN nyon_all.orders AS o ON p.id = o.pid 
LEFT JOIN nyon_all.articles AS a ON o.aid = a.id 
LEFT JOIN nyon_all.brands AS ab ON a.brand_id = ab.id 
LEFT JOIN (select c1.id, count(o1.id) as sumBrandAmount
           from nyon_all.clustering c1
           left join nyon_all.persons p1 on p1.id = c1.pid
           left join nony_all.orders as o1 on o1.id = p1.id
           --maybe some where clause as in your main query
           group by c1.id) s
                               ON s.id = c.id
WHERE c.cluster_round = 'Org_2014-08-27_10:45:35'
GROUP BY cluster, brand 
HAVING brand_amount > 100
ORDER BY c.cluster ASC, brand_amount DESC;