按列分组以获取Postgresql中的数组结果

时间:2015-09-30 08:35:09

标签: python sql postgresql

我有一个名为moviegenre的表格,如下所示:

moviegenre:
- movie (FK movie.id)
- genre (FK genre.id)

我有一个查询(生成ORM),它返回movie.imdbgenre.id的所有genre.id与给定movie.imdb_id的共同点。

SELECT "movie"."imdb_id", 
       "moviegenre"."genre_id" 
FROM   "moviegenre" 
       INNER JOIN "movie" 
               ON ( "moviegenre"."movie_id" = "movie"."id" ) 
WHERE  ( "movie"."imdb_id" IN (SELECT U0."imdb_id" 
                               FROM   "movie" U0 
                                      INNER JOIN "moviegenre" U1 
                                              ON ( U0."id" = U1."movie_id" ) 
                               WHERE  ( U0."last_ingested_on" IS NOT NULL 
                                        AND NOT ( U0."imdb_id" IN 
                                                  ( 'tt0169547' ) ) 
                                        AND NOT ( U0."imdb_id" IN 
                                                  ( 'tt0169547' ) ) 
                                        AND U1."genre_id" IN ( 2, 10 ) )) 
         AND "moviegenre"."genre_id" IN ( 2, 10 ) ) 

问题是我会得到以下格式的结果:

[
  ('imdbid22`, 'genreid1'),
  ('imdbid22`, 'genreid2'),
  ('imdbid44`, 'genreid1'),
  ('imdbid55`, 'genreid8'),
]

在查询本身中是否有一种方法我可以将所有类型ID分组到movie.imdb_id下的列表中?我想在查询中进行分组 目前在我的网络应用程序代码(Python)中执行此操作,当返回50k +行时速度非常慢。

[
  ('imdbid22`, ['genreid1', 'genreid2']),
  ('imdbid44`, 'genreid1'),
  ('imdbid55`, 'genreid8'),
]

提前感谢!

编辑:

这是针对当前结果运行的python代码

results_list = []

for item in movies_and_genres:
    genres_in_common = len(set([
        i['genre__id'] for i in movies_and_genres
        if i['movie__imdb_id'] == item['movie__imdb_id']
    ]))
    imdb_id = item['movie__imdb_id']

    if genres_in_common >= min_in_comon:
        result_item = {
            'movie.imdb_id': imdb_id,
            'count': genres_in_common
        }
        if result_item not in results_list:
            results_list.append(result_item)

return results_list

2 个答案:

答案 0 :(得分:2)

select m.imdb_id, array_agg(g.genre_id) as genre_id
from
    moviegenre g
    inner join 
    movie m on g.movie_id = m.id
where 
    m.last_ingested_on is not null 
    and not m.imdb_id in ('tt0169547')  
    and not m.imdb_id in ('tt0169547')
    and g.genre_id in (2, 10) 
group by m.imdb_id

array_agg将创建一个包含某个genre_ids的所有imdb_id的数组:

http://www.postgresql.org/docs/current/interactive/functions-aggregate.html#FUNCTIONS-AGGREGATE-TABLE

答案 1 :(得分:0)

我希望python代码足够快:

movielist = [
  ('imdbid22', 'genreid1'),
  ('imdbid22', 'genreid2'),
  ('imdbid44, 'genreid1'),
  ('imdbid55', 'genreid8'),
]
dict = {}
for items in movielist:
    if dict[items[0]] not in dict:
        dict[items[0]] = items[1]
    else:
        dict[items[0]] = dict[items[0]].append(items[1])        
print dict

输出:

{'imdbid44': ['genreid1'], 'imdbid55': ['genreid8'], 'imdbid22': ['genreid1', 'genreid2']}

如果您只需要电影名称,请计算: 在原始查询中更改此项,您将得到您不需要python代码的答案

SELECT "movie"."imdb_id", count("moviegenre"."genre_id")

group by "movie"."imdb_id"