在BigQuery中实现以下问题:
{ “FIL”:{ “属性”:{ “ID”:{ID_1: “一个”,ID_2: “B”,ID_3: “C”,ID_4: “d”}}}}
答案 0 :(得分:0)
对于1.,使用standard SQL(取消选中“显示选项”下的“使用旧版SQL”框),您可以使用逗号运算符来获取表格和重复字段的叉积:
WITH MyTable AS (
SELECT STRUCT(STRUCT(ARRAY<STRUCT<key STRING, value STRING>>[('id_1', 'a'), ('id_2', 'b'), ('id_3', 'c'), ('id_4', 'd')] AS id) AS property) AS fil
UNION ALL SELECT STRUCT(STRUCT(ARRAY<STRUCT<key STRING, value STRING>>[('id_1', 'b'), ('id_3', 'e')] AS id) AS property) AS fil
UNION ALL SELECT STRUCT(STRUCT(ARRAY<STRUCT<key STRING, value STRING>>[] AS id) AS property) AS fil
UNION ALL SELECT STRUCT(STRUCT(ARRAY<STRUCT<key STRING, value STRING>>[('id_4', 'a'), ('id_2', 'c')] AS id) AS property) AS fil)
SELECT
COUNT(DISTINCT id.key) AS num_keys,
COUNT(DISTINCT id.value) AS num_values
FROM MyTable t, t.fil.property.id AS id;
+----------+------------+
| num_keys | num_values |
+----------+------------+
| 4 | 5 |
+----------+------------+
使用旧版SQL,您可以使用EXACT_COUNT_DISTINCT
完成类似的操作(您可能不需要展平),尽管设置内联示例更难。
对于2.,您可以使用标准的扁平化SQL应用类似的方法,然后使用"a"
计算COUNTIF(id.value = "a")
的出现次数。在旧版SQL中,您也可以使用COUNT(t.fil.property.id.value = "a")
。
答案 1 :(得分:0)
假设您将字典存储在Your Table中,并将其作为字符串存储在名为json
的字段中答案的关键在于查询。
它解析json字段并提取所有键/值对及其父(字典名称)
SELECT parent, key, value
FROM JS((
SELECT json FROM
(SELECT '{"fil":{"property":{"id":{"id_1":"a","id_2":"b","id_3":"c","id_4":"d"}}}}' AS json),
(SELECT '{"fil":{"property":{"type":{"id_1":"x","id_2":"a","id_3":"y","id_4":"z"}, "category":{"id_1":"v","id_2":"w","id_3":"a","id_4":"b"}}}}' AS json)
),
json, // Input columns
"[{name: 'parent', type:'string'}, // Output schema
{name: 'key', type:'string'},
{name: 'value', type:'string'}]",
"function(r, emit) { // The function
x = JSON.parse(r.json);
processKey(x, '');
function processKey(node, parent) {
Object.keys(node).map(function(key) {
value = node[key].toString();
if (value !== '[object Object]') {
emit({parent:parent, key:key, value:value});
} else {
if (parent !== '' && parent.substr(parent.length-1) !== '.') {parent += '.'};
processKey(node[key], parent + key);
};
});
};
}"
)
以上查询结果如下
parent key value
fil.property.id id_1 a
fil.property.id id_2 b
fil.property.id id_3 c
fil.property.id id_4 d
fil.property.type id_1 x
fil.property.type id_2 a
fil.property.type id_3 y
fil.property.type id_4 z
fil.property.category id_1 v
fil.property.category id_2 w
fil.property.category id_3 a
fil.property.category id_4 b
从那里,您可以轻松获得两个答案:
Q1:我怎样才能计算内部键的总数,&#34; id&#34; (每个)字典
SELECT parent, COUNT(1) AS key_value_pairs
FROM JS((
SELECT json FROM
(SELECT '{"fil":{"property":{"id":{"id_1":"a","id_2":"b","id_3":"c","id_4":"d"}}}}' AS json),
(SELECT '{"fil":{"property":{"type":{"id_1":"x","id_2":"a","id_3":"y","id_4":"z"}, "category":{"id_1":"v","id_2":"w","id_3":"a","id_4":"b"}}}}' AS json)
),
json, // Input columns
"[{name: 'parent', type:'string'}, // Output schema
{name: 'key', type:'string'},
{name: 'value', type:'string'}]",
"function(r, emit) { // The function
x = JSON.parse(r.json);
processKey(x, '');
function processKey(node, parent) {
Object.keys(node).map(function(key) {
value = node[key].toString();
if (value !== '[object Object]') {
emit({parent:parent, key:key, value:value});
} else {
if (parent !== '' && parent.substr(parent.length-1) !== '.') {parent += '.'};
processKey(node[key], parent + key);
};
});
};
}"
)
GROUP BY parent
结果是
parent key_value_pairs
fil.property.id 4
fil.property.type 4
fil.property.category 4
Q2:需要计算次数&#34; a&#34; (任何值)出现在任何词典中的任何ID中。
SELECT value, COUNT(1) AS value_appearances
FROM JS((
SELECT json FROM
(SELECT '{"fil":{"property":{"id":{"id_1":"a","id_2":"b","id_3":"c","id_4":"d"}}}}' AS json),
(SELECT '{"fil":{"property":{"type":{"id_1":"x","id_2":"a","id_3":"y","id_4":"z"}, "category":{"id_1":"v","id_2":"w","id_3":"a","id_4":"b"}}}}' AS json)
),
json, // Input columns
"[{name: 'parent', type:'string'}, // Output schema
{name: 'key', type:'string'},
{name: 'value', type:'string'}]",
"function(r, emit) { // The function
x = JSON.parse(r.json);
processKey(x, '');
function processKey(node, parent) {
Object.keys(node).map(function(key) {
value = node[key].toString();
if (value !== '[object Object]') {
emit({parent:parent, key:key, value:value});
} else {
if (parent !== '' && parent.substr(parent.length-1) !== '.') {parent += '.'};
processKey(node[key], parent + key);
};
});
};
}"
)
GROUP BY value
value value_appearances
a 3
b 2
c 1
d 1
x 1
y 1
z 1
v 1
w 1
答案 2 :(得分:0)
由于其他答案对我来说很难,所以我做了一个适用于string:int dict的正则表达式
SELECT
*, REGEXP_EXTRACT_ALL(my_dict_column, r'"(\w+": \d+)') as keys
FROM test.test_table
由此您可以执行键,值等