BigQuery:加权平均值

时间:2018-02-25 09:27:28

标签: google-bigquery standard-sql

表:

| User_ID |  Red | Blue | Green |  Rating |
|   a     |   23 |  33  |   42  |    99   |
|   a     |   56 |  45  |   62  |    45   |
|   a     |   23 |  49  |   28  |    67   |
|   b     |   39 |  59  |   10  |    87   |
|   b     |   18 |  28  |   59  |    38   |
|   b     |   40 |  50  |   38  |    94   |

我想要获得的结果是一个独特的user_id行,其加权平均值为红色,蓝色和绿色 - 基于评级列。

颜色*评级/(a或b的评级总和)

//编辑

无法绕过如何做到这一点。试过以下但这是徒劳的尝试

   WITH
      averages AS (
      SELECT
        User_ID,
        SUM(rating) AS average
      FROM
`       project.dataset.table` 
      GROUP BY
        1)
    SELECT
      averages.User_ID,
      Red*(Rating/average),
      Blue*(rating/average),
      Green*(rating/average)
    FROM
      `project.dataset.table` a
    LEFT JOIN
      averages
    ON
      a.user_id = averages.user_id 

2 个答案:

答案 0 :(得分:3)

我明白了 - 这更像是一个数学问题。您将值与其权重相乘,然后除以计数,而不是除以权重之和。每组的一切(用户ID)。您可以尝试SELECT SUM(x * weight) / SUM(weight) FROM table GROUP BY ...

之类的内容
WITH t AS (SELECT * FROM 
  UNNEST([
    STRUCT('a' AS userID, 23 AS red, 99 AS weight),
    STRUCT('a' AS userID, 56 AS red, 45 AS weight),
    STRUCT('a' AS userID, 23 AS red, 67 AS weight),
    STRUCT('b' AS userID, 39 AS red, 87 AS weight),
    STRUCT('b' AS userID, 18 AS red, 38 AS weight),
    STRUCT('b' AS userID, 40 AS red, 94 AS weight)
  ])
  )

SELECT
  userID,
  SUM(red*weight) / SUM(weight) weightedAvg,
  AVG(red) normalAvg
FROM
  t
GROUP BY
  userID

HTH!

答案 1 :(得分:2)

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 'a' User_ID, 23 Red, 33 Blue, 42 Green, 99 Rating UNION ALL
  SELECT 'a', 56, 45, 62, 45 UNION ALL
  SELECT 'a', 23, 49, 28, 67 UNION ALL
  SELECT 'b', 39, 59, 10, 87 UNION ALL
  SELECT 'b', 18, 28, 59, 38 UNION ALL
  SELECT 'b', 40, 50, 38, 94 
)
SELECT User_ID,  
  CAST(SUM(Red * Rating) / SUM(Rating) AS INT64) Red,
  CAST(SUM(Blue * Rating) / SUM(Rating) AS INT64) Blue,
  CAST(SUM(Green * Rating) / SUM(Rating) AS INT64) Green
FROM `project.dataset.table` 
GROUP BY User_ID  

结果

Row User_ID Red     Blue    Green    
1   a       30      41      42   
2   b       36      50      31