将bigquery json字符串转换为列

时间:2019-08-30 13:56:39

标签: google-bigquery bigquery-standard-sql

对于以字符串形式出现的json数据,我希望具有类似JSON_EXTRACT_SCALAR的名称,但是要有灵活的结果列数。

这里是示例数据-不同的行可以具有不同的列名,并且可以嵌套json:

WITH `my_table` AS (
  SELECT '{"sku_types":"{\"id\":\"5433306\",\"product_code\":\"adfklj_ewkj\"}","additional_info":"Face 30 ml","stock_level":"20+"}' as json_string 
  union all 
  SELECT '{"additional_info":"Face 100 ml","offer_info":"30%"}' as json_string 
)
SELECT * 
from my_table;

我希望将此数据提取到单独的列中:sku_types.id, sku_types.product_code, additional_info, stock_level, offer_info

这可以用SQL完成还是需要JavaScript?

我事先不知道json字段的名称,所以我无法使用JSON_EXTRACT_SCALARJSON_EXTRACT来做到这一点。

1 个答案:

答案 0 :(得分:1)

以下BigQuery标准SQL示例

#standardSQL
CREATE TEMPORARY FUNCTION parseJson(y STRING)
RETURNS ARRAY<STRING>
LANGUAGE js AS """
  var z = new Array();
  processKey(JSON.parse(y), '');
  function processKey(node, parent) {
    Object.keys(node).map(function(key) {
      value = node[key].toString();
      if (value !== '[object Object]') {
        if (parent !== '' && parent.substr(parent.length-1) !== '.') {
          z.push(parent + '.' + key + ':' + value)
        } else {
          z.push(key + ':' + value)
        }
      } else {
        if (parent !== '' && parent.substr(parent.length-1) !== '.') {parent += '.'};
        processKey(node[key], parent + key);
      };
    });         
  };
  return z
""";
WITH `my_table` AS (
  SELECT 1 id, '{"sku_types":{"id":"5433306","product_code":"adfklj_ewkj"},"additional_info":"Face 30 ml","stock_level":"20+"}' AS json_string UNION ALL 
  SELECT 2, '{"additional_info":"Face 100 ml","offer_info":"30%"}' AS json_string 
)
SELECT id, 
  ARRAY(
    SELECT AS STRUCT SPLIT(kv, ':')[OFFSET(0)] key, SPLIT(kv, ':')[SAFE_OFFSET(1)] value
    FROM UNNEST(parseJson(json_string)) kv
  ) params
FROM my_table

有结果

Row id  params.key              params.value     
1   1   sku_types.id            5433306  
        sku_types.product_code  adfklj_ewkj  
        additional_info         Face 30 ml   
        stock_level             20+  
2   2   additional_info         Face 100 ml  
        offer_info              30%     

您可以看到,而不是将所有可能的属性解析为单独的列(除非您事先知道它们,否则在这里是不可能的)-上述方法将它们压平为params数组内的key:value对

注意:在上面的示例中,我使用:来构造key:value对,然后将它们拆分。如果您期望值具有此字符-您可以调整代码,而不用:来使用更独特的内容-例如:::::::

  

快速更新以解决评论:
  ...问题是某些json值为null,在这种情况下,它会引发错误

#standardSQL
CREATE TEMPORARY FUNCTION parseJson(y STRING)
RETURNS ARRAY<STRING>
LANGUAGE js AS """
  var z = new Array();
  processKey(JSON.parse(y), '');
  function processKey(node, parent) {
    Object.keys(node).map(function(key) {
      if (!node[key]) {
        value = 'n/a'
      } else {
        value = node[key].toString();
      }
      if (value !== '[object Object]') {
        if (parent !== '' && parent.substr(parent.length-1) !== '.') {
          z.push(parent + '.' + key + ':' + value)
        } else {
          z.push(key + ':' + value)
        }
      } else {
        if (parent !== '' && parent.substr(parent.length-1) !== '.') {parent += '.'};
        processKey(node[key], parent + key);
      };
    });         
  };
  return z
""";
WITH `my_table` AS (
  SELECT 1 id, '{"sku_types":{"id":"5433306","product_code":"adfklj_ewkj"},"additional_info":"Face 30 ml","stock_level":"20+"}' AS json_string UNION ALL 
  SELECT 2, '{"additional_info":"Face 100 ml","offer_info":"30%"}' AS json_string union all
  SELECT 3 as id , '{"offer_info":"30%", "price":null}' AS json_string  
)
SELECT id, 
  ARRAY(
    SELECT AS STRUCT SPLIT(kv, ':')[OFFSET(0)] key, SPLIT(kv, ':')[SAFE_OFFSET(1)] value
    FROM UNNEST(parseJson(json_string)) kv
  ) params
FROM my_table  

有结果

Row id  params.key              params.value     
1   1   sku_types.id            5433306  
        sku_types.product_code  adfklj_ewkj  
        additional_info         Face 30 ml   
        stock_level             20+  
2   2   additional_info         Face 100 ml  
        offer_info              30%  
3   3   offer_info              30%  
        price                   n/a    

您可以看到她-我用'n/a'替换了空值,但是您可以应用所需的任何逻辑