为什么Snowflake在转换为扁平化列表时会更改JSON值的顺序?

时间:2019-04-26 19:53:42

标签: sql snowflake-datawarehouse snowflake

我在表中存储了JSON对象,我正在尝试编写查询以从该JSON获取第一个元素。

复制脚本

create table staging.par.test_json (id int, val varchar(2000)); 

insert into staging.par.test_json values (1, '{"list":[{"element":"Plumber"},{"element":"Craft"},{"element":"Plumbing"},{"element":"Electrics"},{"element":"Electrical"},{"element":"Tradesperson"},{"element":"Home services"},{"element":"Housekeepings"},{"element":"Electrical Goods"}]}');
insert into staging.par.test_json values (2,'
  {
    "list": [
      {
        "element": "Wholesale jeweler"
      },
      {
        "element": "Fashion"
      },
      {
        "element": "Industry"
      },
      {
        "element": "Jewelry store"
      },
      {
        "element": "Business service"
      },
      {
        "element": "Corporate office"
      }
    ]
  }');



with cte_get_cats AS
(
select id, 
       val as category_list 
       from staging.par.test_json
),
cats_parse AS
(
  select id,
         parse_json(category_list) as c
  from cte_get_cats
),
distinct_cats as
(
  select id,
         INDEX,
         UPPER(cast(value:element AS varchar)) As c
  from 
      cats_parse,
      LATERAL flatten(INPUT => c:"list")
  order by 1,2 
) ,
cat_array AS
    (
        SELECT  
            id,
            array_agg(DISTINCT c) AS sds_categories
        FROM
            distinct_cats
        GROUP BY 1
    ),
sds_cats AS
( 
         select id,
         cast(sds_categories[0] AS varchar) as sds_primary_category
         from cat_array
)
select * from sds_cats;

值:类别

{"list":[{"element":"Plumber"},{"element":"Craft"},{"element":"Plumbing"},{"element":"Electrics"},{"element":"Electrical"},{"element":"Tradesperson"},{"element":"Home services"},{"element":"Housekeepings"},{"element":"Electrical Goods"}]}

将其添加到列表中会给我

["Plumber","Craft","Plumbing","Electrics","Electrical","Tradesperson","Home services","Housekeepings","Electrical Goods"]

问题: 其顺序并不总是相同的。雪花似乎会改变顺序,有时雪花会根据字母更改顺序。 我该如何使其静止。我不希望更改顺序。

1 个答案:

答案 0 :(得分:0)

问题是您使用ARRAY_AGG的方式:

        array_agg(DISTINCT c) AS sds_categories

像这样指定它使Snowflake没有有关如何排列数组内容的准则。您应该假定数组的创建顺序与其输入记录的顺序相同-可以,但是不能保证。所以您可能想做

        array_agg(DISTINCT c) within group (order by index) AS sds_categories

但这无法正常工作,就像您使用DISTINCT c一样,每个index的{​​{1}}的值都是未知的。也许您不需要c,那么这将起作用

DISTINCT

如果确实需要 array_agg(c) within group (order by index) AS sds_categories ,则需要以某种方式将DISTINCT与不同的index值相关联。一种方法是在输入中的c上使用MIN函数。这是完整的查询

index