使用BigQuery查找字符串的最频繁值

时间:2020-06-10 18:04:13

标签: google-bigquery

我的目标是查找最频繁的值,并使用BigQuery按用户ID对其进行分组。它应该能够计算出每个用户ID使用的语言数量,并且结果应该返回最高的语言。但是,我发现错误说

No matching signature for aggregate function AVG for argument types: STRING. Supported signatures: AVG(INT64); AVG(FLOAT64); AVG(NUMERIC) at [3:5]

这是我的代码:

SELECT * FROM( 
  SELECT COUNT(*) AS cnt,
    AVG(Language) AS mean,
    APPROX_TOP_COUNT(Language, 1)[OFFSET(0)].value AS most_frequent_value
  FROM `language`
  WHERE Language IS NOT NULL
  GROUP BY User_ID);

我应该更改什么,以便结果返回每个用户ID首选的语言值?

enter image description here

存储的生产者:

 CASE 
    WHEN Preferred_Language in ('EN', 'English') THEN 'EN' 
    ELSE Preferred_Language 
END AS Preferred_Language,

1 个答案:

答案 0 :(得分:2)

以下是BigQuery标准SQL

#standardSQL
SELECT
  User_ID,
  ARRAY_AGG(Language ORDER BY cnt DESC LIMIT 1)[OFFSET(0)] most_frequent_language
FROM (
  SELECT 
    User_ID,
    Language,
    COUNT(*) AS cnt
  FROM `project.dataset.language`
  WHERE Language IS NOT NULL
  GROUP BY User_ID, Language
)
GROUP BY User_ID