我有一张表import numpy as np
import matplotlib.pylab as plt
dat = np.random.randn(10,10)
plt.imshow(dat, interpolation='none')
clb = plt.colorbar()
clb.set_label('label', labelpad=-40, y=1.05, rotation=0)
plt.show()
和一张表project_product
。人们消费产品,可能会给产品评分从1到10。
请允许我使用一些ASCII艺术作为澄清:
project_consummation
现在我要概述一个产品的投票。当然,可能+--------------------------+ +-------------------------+
| project_product | | project_consummation |
|--------------------------| |-------------------------|
| id integer primary key |-\ | id integer primary key |
| name varchar | \->| product_id integer |
| ... | | rating integer |
| various other fields... | | user_id integer |
+--------------------------+ | ... |
| various other fields... |
+-------------------------+
没有consummation
值(例如NULL),因此必须忽略这些值。
结果应该如下所示(每个评分从1到10应该有自己的列,表明给出该评级产品的人数,以及评分总数rating
以及之后的某些评分中位数,标准差等):
num_ratings
我创建了一个非常笨拙的“解决方案”,因为我为每个评级栏都做了 product_id | rating1 | rating2 | ... |rating10 | num_ratings
------------+---------+---------+-----+---------+-------------
1002 | | | ... | 1 | 1
1014 | 4 | | ... | 2 | 6
1015 | 2 | 1 | ... | 1 | 4
(我只会显示前3列,但我相信你会看到什么这变成一团糟:
LEFT OUTER JOIN
对于更好的代码,尤其是更好的性能,什么是更好的解决方案?
答案 0 :(得分:2)
这样做:
select
p.id product_id,
count(case when c.rating = 1 then 1 else null end) rating1,
count(case when c.rating = 2 then 1 else null end) rating2,
count(case when c.rating = 3 then 1 else null end) rating3,
count(case when c.rating = 4 then 1 else null end) rating4,
count(case when c.rating = 5 then 1 else null end) rating5,
count(case when c.rating = 6 then 1 else null end) rating6,
count(case when c.rating = 7 then 1 else null end) rating7,
count(case when c.rating = 8 then 1 else null end) rating8,
count(case when c.rating = 9 then 1 else null end) rating9,
count(case when c.rating = 10 then 1 else null end) rating10,
count(c.rating) num_ratings
from project_product p
left join project_consummation c on c.product_id = p.id
group by p.id
order by p.id;
或更短的评级形式:
select
p.id product_id,
count(nullif(c.rating = 1, false)) rating1,
count(nullif(c.rating = 2, false)) rating2,
count(nullif(c.rating = 3, false)) rating3,
count(nullif(c.rating = 4, false)) rating4,
count(nullif(c.rating = 5, false)) rating5,
count(nullif(c.rating = 6, false)) rating6,
count(nullif(c.rating = 7, false)) rating7,
count(nullif(c.rating = 8, false)) rating8,
count(nullif(c.rating = 9, false)) rating9,
count(nullif(c.rating = 10, false)) rating10,
count(c.rating) num_ratings
from project_product p
left join project_consummation c on c.product_id = p.id
group by p.id
order by p.id;
答案 1 :(得分:1)
不完美......但希望你明白这个想法
使用案例
SELECT project_product.id,project_product.name
, sum(case when rating = 1 then 1 else 0 end ) as rating1
, sum(case when rating = 2 then 1 else 0 end ) as rating2
, sum(case when rating = 3 then 1 else 0 end ) as rating3
, sum(case when rating = 4 then 1 else 0 end ) as rating4
, sum(case when rating = 5 then 1 else 0 end ) as rating5
, sum(case when rating = 6 then 1 else 0 end ) as rating6
, sum(case when rating = 7 then 1 else 0 end ) as rating7
, sum(case when rating = 8 then 1 else 0 end ) as rating8
, sum(case when rating = 9 then 1 else 0 end ) as rating9
, sum(case when rating = 10 then 1 else 0 end ) as rating10
FROM project_product
LEFT JOIN project_consummation ON (project_product.id = project_consummation.product_id)
GROUP BY project_product.id, project_product.name
并使用交叉表:
-- if necessary:
-- CREATE EXTENSION tablefunc;
SELECT project_product.id,
rating1, rating2, rating3, rating4, rating5,
rating6, rating7, rating8, rating9, rating10,
rating1+rating2+rating3+rating4+rating5+
rating6+rating7+rating8+rating9+rating10 as num_ratings
FROM project_product
LEFT JOIN crosstab(
'select product_id, rating, count(*)
from project_consummation
group by product_id, rating
order by product_id, rating ',
'select generate_series(1, 10)')
AS main (
id integer, rating1 integer, rating2 integer, rating3 integer,
rating4 integer, rating5 integer, rating6 integer,
rating7 integer, rating8 integer, rating9 integer, rating10 integer
) ON (project_product.id = main.id )