如何正确使用平均值和分区?

时间:2019-04-11 18:13:29

标签: sql google-bigquery

我有一个包含user_idsvisitStartTimeproduct prices的数据,这些数据已被用户查看。我尝试获取每个用户访问的平均价格和最高价格,但我的查询未在分区(user + visitStartTime)上进行计算,而是仅通过user_id分区进行计算。

这是我的查询:

select distinct fullVisitorId ,visitStartTime,
    avg(pr) over (partition by visitStartTime,fullVisitorId) as avgPrice,
    max(pr) over (partition by fullVisitorId,visitStartTime) as maxPrice
from dataset

这就是我得到的:

+-----+----------------------+-----------------+----------+----------+--+
| Row |    fullVisitorId     |    visitStartTi | avgPrice | maxPrice |  |
+-----+----------------------+-----------------+----------+----------+--+
|   1 |    64217461724617261 |      1538478049 |    484.5 |    969.0 |  |
|   2 |    64217461724617261 |      1538424725 |    484.5 |    969.0 |  |
+-----+----------------------+-----------------+----------+----------+--+

查询中我缺少什么?

样本数据

+---------------+----------------+---------------+
| FullVisitorId | VisitStartTime | ProductPrice  |
+---------------+----------------+---------------+
|           123 |       72631241 |           100 |
|           123 |       72631241 |           250 |
|           123 |       72631241 |            10 |
|           123 |       73827882 |            70 |
|           123 |       73827882 |            90 |
+---------------+----------------+---------------+

所需结果:

+-----+---------------+--------------+----------+----------+
| Row | fullVisitorId | visitStartTi | avgPrice | maxPrice |
+-----+---------------+--------------+----------+----------+
|   1 |           123 |     72631241 |    120.0 |    250.0 |
|   2 |           123 |     73827882 |     80.0 |     90.0 |
+-----+---------------+--------------+----------+----------+

1 个答案:

答案 0 :(得分:2)

在这种情况下,您不需要“分区依据”。

尝试一下:

select fullVisitorId ,visitStartTime, avg(ProductPrice) avgPrice ,max(ProductPrice) maxPrice
from sample
group by FullVisitorId,VisitStartTime;

(查询是非常标准的,所以我认为您可以在BigQuery中使用它)

以下是使用PostgreSQL的输出:DB<>FIDDLE

更新

还可以使用BigQuery Standard SQL:

#standardSQL
SELECT 
  FullVisitorId, 
  VisitStartTime, 
  AVG(ProductPrice) as avgPrice,
  MAX(ProductPrice) as maxPrice
FROM `project.dataset.table`
GROUP BY FullVisitorId, VisitStartTime 

如果要测试:

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 123 FullVisitorId, 72631241 VisitStartTime, 100 ProductPrice 
  UNION ALL SELECT 123, 72631241, 250
  UNION ALL SELECT 123, 72631241, 10
  UNION ALL SELECT 123, 73827882, 70
  UNION ALL SELECT 123, 73827882, 90
)

SELECT 
  FullVisitorId, 
  VisitStartTime, 
  AVG(ProductPrice) as avgPrice,
  MAX(ProductPrice) as maxPrice
FROM `project.dataset.table`
GROUP BY FullVisitorId, VisitStartTime