我的表格中包含一个字符串列,其中包含JSON的对象集合。假设对象是单词。
我想聚合选择最流行的单词(比如map-reduce示例)。数据不在Bigquery的嵌套记录中。我知道我需要使用JSON_EXTRACT。
例如: 用户词
123“{”totalItems“:2,”items“:[{”word“:”drink“},{”word“:”food“}]}”, 456“{”totalItems“:3,”items“:[{”word“:”food“},{”word“:”dog“},”word“:”drink“}]}”, 123“{”totalItems“:1,”items“:[{”word“:”drink“}]}”
结果应该是: 3喝 2食物 1只狗
如果我按用户分组,那将是: 用户ID计数字 123 2喝, 123 1食物, 456 1食物......等等......
提前致谢
答案 0 :(得分:2)
按Word :
SELECT id, word, COUNT(1) AS cnt FROM (
SELECT id, REGEXP_EXTRACT(item, r':"(\w+)"') AS word,
FROM (
SELECT id, SPLIT(JSON_EXTRACT(items, '$.items')) AS item
FROM
(SELECT 123 AS id, '{"totalItems":2,"items":[{"word":"drink"},{"word":"food"}]}' AS items),
(SELECT 456 AS id, '{"totalItems":3,"items":[{"word":"food"},{"word":"dog"},{"word":"drink"}]}' AS items),
(SELECT 123 AS id, '{"totalItems":1,"items":[{"word":"drink"}]}' AS items)
)
)
GROUP BY id, word
按用户,Word :
SELECT word, COUNT(1) AS cnt FROM (
SELECT REGEXP_EXTRACT(item, r':"(\w+)"') AS word,
FROM (
SELECT SPLIT(JSON_EXTRACT(items, '$.items')) AS item
FROM
(SELECT 123 AS id, '{"totalItems":2,"items":[{"word":"drink"},{"word":"food"}]}' AS items),
(SELECT 456 AS id, '{"totalItems":3,"items":[{"word":"food"},{"word":"dog"},{"word":"drink"}]}' AS items),
(SELECT 123 AS id, '{"totalItems":1,"items":[{"word":"drink"}]}' AS items)
)
)
GROUP BY word
答案 1 :(得分:1)
米哈伊尔的回答很好!请注意,需要进行一些调整,使用SPLIT和REGEXP_EXTRACT执行,因为JSON_EXTRACT函数不能很好地处理数组。
另一种方法,如果您想使用BigQuery JavaScript UDF:
SELECT userid, word, COUNT(*) c
FROM (
SELECT * FROM
js(
// I wish you had given me a sample table instead when asking the question
(SELECT * FROM
(SELECT 123 AS id, '{"totalItems":2,"items":[{"word":"drink"},{"word":"food"}]}' AS items),
(SELECT 456 AS id, '{"totalItems":3,"items":[{"word":"food"},{"word":"dog"},{"word":"drink"}]}' AS items),
(SELECT 123 AS id, '{"totalItems":1,"items":[{"word":"drink"}]}' AS items)
),
// Input columns.
id, items,
// Output schema.
"[{name: 'word', type:'string'},
{name: 'userid', type:'integer'}]",
// The function.
"function(r, emit) {
x=JSON.parse(r.items)
x.items.forEach(function(entry) {
emit({word:entry.word, userid:r.id});
});
}"
)
)
GROUP BY 1,2