泛化BigQuery中的前N个查询

时间:2019-05-18 01:40:04

标签: sql google-bigquery pivot pivot-table

这是一个后续问题,用于概括Top N results in BigQuery across multiple columns的情况。现在让我们获取以下数据:

 year   genre         studio            title       revenue
2014    fantasy       fox               avatar      10
2015    fantasy       fox               avatar      12
2016    fantasy       fox               avatar      12
2015    action        sony              spider-man  10
2015    romance       paramount         love letter 15
2015    action        sony              spider-man  10
2015    action        sony              spider-man  10
2015    action        disney            toy story   10
2015    action        sony              edgar       4
2015    action        sony              thomas      1
2015    fantasy       fox               avatar      2

我想获得以下结果以构建树结构:

Past 2 years, Top 2 genres (Alphabetically), Top 2 studios (by Count), Top 2 titles by SUM Revenue DESC

所以我们会得到类似的东西:

enter image description here

从概念上讲,我要实现的查询是这样的:

SELECT year, genre, studio, title, SUM(revenue)
FROM titles
GROUP BY year, genre, studio, title

// in pseudocode
ORDER BY
    (year DESC) LIMIT 2,
    (genre ASC) LIMIT 10,
    (COUNT(studio) DESC) LIMIT 2,
    (SUM(revenue) DESC) LIMIT 2

执行上述操作的最佳方法是什么,它将更多地是在BQ中构建树结构的概括。

2 个答案:

答案 0 :(得分:1)

我无法在您的数据集中找到“ avatar2”,但是它在结果中。因此,我无法验证边缘的答案。这是我提出的SQL Server查询。我希望不会有太多更改。

 WITH A as 
    (SELECT 
    year, 
    genre, 
    studio,
    COUNT(*) OVER (PARTITION BY year, genre, studio) AS studio_movie_count,
    title,
    revenue,
    SUM(revenue) OVER (PARTITION BY year, genre, studio,title) AS revenue_sum FROM movies),

    B as

    (SELECT 
    year,
    DENSE_RANK() OVER (ORDER BY year DESC) AS year_num, 
    genre,
    DENSE_RANK() OVER (PARTITION BY year ORDER BY genre ASC) AS genre_num,
    studio,
    DENSE_RANK() OVER (PARTITION BY year, genre ORDER BY studio_movie_count DESC) AS studio_num,
    title,
    DENSE_RANK() OVER (PARTITION BY year, genre, studio ORDER BY revenue_sum DESC) AS title_num,
    revenue

    FROM A)

    SELECT year, genre, studio, title, revenue
    FROM B
    WHERE year_num < 3 AND
    genre_num < 3 AND
    studio_num < 3 AND
    title_num < 3;

答案 1 :(得分:1)

过滤子查询中前2年的行,同时按制片厂查找电影数量,按标题查找收入总额。

然后按类型,工作室,收入和过滤条件找到排名前2位的排名。

import plotly
from plotly import graph_objs as go, offline as po, tools
po.init_notebook_mode()

import numpy as np
import json

x = list(np.linspace(-np.pi, np.pi, 100))
values_1 = list(np.sin(x))
values_1b = [elem*-1 for elem in values_1]

values_2 = list(np.tan(x))
values_2b = [elem*-1 for elem in values_2]


line = go.Scatter(
    x=x,
    y=values_1
)

line2 = go.Scatter(
    x=x,
    y=values_1b
)


updatemenus = [
    {
        'buttons': [
            {
                'method': 'restyle',
                'label': 'Val 1',
                'args': [
                    {'y': [values_1, values_1b]},
                ]
            },
            {
                'method': 'restyle',
                'label': 'Val 2',
                'args': [
                    {'y': [values_2, values_2b]},
                ]
            }
        ],
        'direction': 'down',
        'showactive': True,
    }
]

layout = go.Layout(
    updatemenus=updatemenus,
)

figure = go.Figure(data=[line, line2], layout=layout)
po.iplot(figure)