KeyQuery中的键,值计数

时间:2016-06-21 13:38:20

标签: dictionary count key google-bigquery

在BigQuery中实现以下问题:

  1. 我有以下JSON格式的字典。如何计算“id”字典中的键值,总数?
  2. { “FIL”:{ “属性”:{ “ID”:{ID_1: “一个”,ID_2: “B”,ID_3: “C”,ID_4: “d”}}}}

    1. 值“a”可以出现在多个此类词典中的任何ID(id_1,...,id_5)中。需要计算“a”在任何词典中的任何ID中出现的次数。

3 个答案:

答案 0 :(得分:0)

对于1.,使用standard SQL(取消选中“显示选项”下的“使用旧版SQL”框),您可以使用逗号运算符来获取表格和重复字段的叉积:

WITH MyTable AS (
  SELECT STRUCT(STRUCT(ARRAY<STRUCT<key STRING, value STRING>>[('id_1', 'a'), ('id_2', 'b'), ('id_3', 'c'), ('id_4', 'd')] AS id) AS property) AS fil
  UNION ALL SELECT STRUCT(STRUCT(ARRAY<STRUCT<key STRING, value STRING>>[('id_1', 'b'), ('id_3', 'e')] AS id) AS property) AS fil
  UNION ALL SELECT STRUCT(STRUCT(ARRAY<STRUCT<key STRING, value STRING>>[] AS id) AS property) AS fil
  UNION ALL SELECT STRUCT(STRUCT(ARRAY<STRUCT<key STRING, value STRING>>[('id_4', 'a'), ('id_2', 'c')] AS id) AS property) AS fil)
SELECT
  COUNT(DISTINCT id.key) AS num_keys,
  COUNT(DISTINCT id.value) AS num_values
FROM MyTable t, t.fil.property.id AS id;
+----------+------------+
| num_keys | num_values |
+----------+------------+
|        4 |          5 |
+----------+------------+

使用旧版SQL,您可以使用EXACT_COUNT_DISTINCT完成类似的操作(您可能不需要展平),尽管设置内联示例更难。

对于2.,您可以使用标准的扁平化SQL应用类似的方法,然后使用"a"计算COUNTIF(id.value = "a")的出现次数。在旧版SQL中,您也可以使用COUNT(t.fil.property.id.value = "a")

答案 1 :(得分:0)

假设您将字典存储在Your Table中,并将其作为字符串存储在名为json

的字段中
  

答案的关键在于查询。

它解析json字段并提取所有键/值对及其父(字典名称)

SELECT parent, key, value 
FROM JS((
  SELECT json FROM  
    (SELECT '{"fil":{"property":{"id":{"id_1":"a","id_2":"b","id_3":"c","id_4":"d"}}}}' AS json),
    (SELECT '{"fil":{"property":{"type":{"id_1":"x","id_2":"a","id_3":"y","id_4":"z"}, "category":{"id_1":"v","id_2":"w","id_3":"a","id_4":"b"}}}}' AS json)
  ),
  json,                                    // Input columns
  "[{name: 'parent', type:'string'},       // Output schema
   {name: 'key', type:'string'},
   {name: 'value', type:'string'}]",
   "function(r, emit) {                    // The function
      x = JSON.parse(r.json);
      processKey(x, '');
      function processKey(node, parent) {
        Object.keys(node).map(function(key) {
          value = node[key].toString();
          if (value !== '[object Object]') {
            emit({parent:parent, key:key, value:value});
          } else {
            if (parent !== '' && parent.substr(parent.length-1) !== '.') {parent += '.'};
            processKey(node[key], parent + key);
          };
        });         
      };
    }"
  )

以上查询结果如下

parent                  key     value    
fil.property.id         id_1    a    
fil.property.id         id_2    b    
fil.property.id         id_3    c    
fil.property.id         id_4    d    
fil.property.type       id_1    x    
fil.property.type       id_2    a    
fil.property.type       id_3    y    
fil.property.type       id_4    z    
fil.property.category   id_1    v    
fil.property.category   id_2    w    
fil.property.category   id_3    a    
fil.property.category   id_4    b

从那里,您可以轻松获得两个答案:

  

Q1:我怎样才能计算内部键的总数,&#34; id&#34; (每个)字典

SELECT parent, COUNT(1) AS key_value_pairs  
FROM JS((
  SELECT json FROM  
    (SELECT '{"fil":{"property":{"id":{"id_1":"a","id_2":"b","id_3":"c","id_4":"d"}}}}' AS json),
    (SELECT '{"fil":{"property":{"type":{"id_1":"x","id_2":"a","id_3":"y","id_4":"z"}, "category":{"id_1":"v","id_2":"w","id_3":"a","id_4":"b"}}}}' AS json)
  ),
  json,                                    // Input columns
  "[{name: 'parent', type:'string'},       // Output schema
   {name: 'key', type:'string'},
   {name: 'value', type:'string'}]",
   "function(r, emit) {                    // The function
      x = JSON.parse(r.json);
      processKey(x, '');
      function processKey(node, parent) {
        Object.keys(node).map(function(key) {
          value = node[key].toString();
          if (value !== '[object Object]') {
            emit({parent:parent, key:key, value:value});
          } else {
            if (parent !== '' && parent.substr(parent.length-1) !== '.') {parent += '.'};
            processKey(node[key], parent + key);
          };
        });         
      };
    }"
  )
GROUP BY parent

结果是

parent                  key_value_pairs  
fil.property.id         4    
fil.property.type       4    
fil.property.category   4    
  

Q2:需要计算次数&#34; a&#34; (任何值)出现在任何词典中的任何ID中。

SELECT value, COUNT(1) AS value_appearances
FROM JS((
  SELECT json FROM  
    (SELECT '{"fil":{"property":{"id":{"id_1":"a","id_2":"b","id_3":"c","id_4":"d"}}}}' AS json),
    (SELECT '{"fil":{"property":{"type":{"id_1":"x","id_2":"a","id_3":"y","id_4":"z"}, "category":{"id_1":"v","id_2":"w","id_3":"a","id_4":"b"}}}}' AS json)
  ),
  json,                                    // Input columns
  "[{name: 'parent', type:'string'},       // Output schema
   {name: 'key', type:'string'},
   {name: 'value', type:'string'}]",
   "function(r, emit) {                    // The function
      x = JSON.parse(r.json);
      processKey(x, '');
      function processKey(node, parent) {
        Object.keys(node).map(function(key) {
          value = node[key].toString();
          if (value !== '[object Object]') {
            emit({parent:parent, key:key, value:value});
          } else {
            if (parent !== '' && parent.substr(parent.length-1) !== '.') {parent += '.'};
            processKey(node[key], parent + key);
          };
        });         
      };
    }"
  )
GROUP BY value  

value   value_appearances    
a       3    
b       2    
c       1    
d       1    
x       1    
y       1    
z       1    
v       1    
w       1    

答案 2 :(得分:0)

由于其他答案对我来说很难,所以我做了一个适用于string:int dict的正则表达式

SELECT 
        *, REGEXP_EXTRACT_ALL(my_dict_column, r'"(\w+": \d+)') as keys
FROM test.test_table

由此您可以执行键,值等