雪花从半结构化数据中检索价值

时间:2021-07-02 02:20:02

标签: arrays json snowflake-cloud-data-platform

我正在尝试从表 X 中名为 extravariant 列中的 Snowflake 半结构化数据中检索健康值。 >

代码示例如下:

[
  {
    "party":
 "[{\"class\":\"Farmer\",\"gender\":\"Female\",\"ethnicity\":\"NativeAmerican\",\"health\":2},
{\"class\":\"Adventurer\",\"gender\":\"Male\",\"ethnicity\":\"White\",\"health\":3},
{\"class\":\"Farmer\",\"gender\":\"Male\",\"ethnicity\":\"White\",\"health\":0},
{\"class\":\"Banker\",\"gender\":\"Female\",\"ethnicity\":\"White\",\"health\":0}
  }
] 

我尝试从 https://community.snowflake.com/s/article/querying-semi-structured-data

阅读雪花文档

我还尝试了以下查询来扁平化查询:

SELECT result.value:health AS PartyHealth 
FROM X 
WHERE value = 'Trail'
AND name = 'Completed' 
AND PartyHealth > 0, 
TABLE(FLATTEN(X, 'party')) result

SELECT [0]['party'][0]['health'] AS Health
FROM X 
WHERE value = 'Trail'
AND name = 'Completed' 
AND PH > 0;

我正在尝试从 extra 列中的表 X 中检索健康值,该列包含有 4 个重复值 [0-3] 的变异方。考虑到文档没有多大意义,我不确定如何做到这一点有人能够告诉我如何在 Snowflake 中查询半结构化数据?

2 个答案:

答案 0 :(得分:2)

首先,您发布的 JSON 值似乎格式错误(可能是复制粘贴问题)。

这是一个有效的例子:

  • 首先将您的 JSON 格式化:

    [{ "party": [ {"class":"Farmer","gender":"Female","ethnicity":"NativeAmerican","health":2}, {"class":"Adventurer","gender":"Male","ethnicity":"White","health":3}, {"class":"Farmer","gender":"Male","ethnicity":"White","health":0}, {"class":"Banker","gender":"Female","ethnicity":"White","health":0} ] }]

  • 创建一个表进行测试:

    CREATE OR REPLACE TABLE myvariant (v variant);

  • 将 JSON 值插入此表:

    INSERT INTO myvariant SELECT PARSE_JSON('[{ "party": [ {"class":"Farmer","gender":"Female","ethnicity":"NativeAmerican","health":2}, {"class":"Adventurer","gender":"Male","ethnicity":"White","health":3}, {"class":"Farmer","gender":"Male","ethnicity":"White","health":0}, {"class":"Banker","gender":"Female","ethnicity":"White","health":0} ] }]');

  • 现在,要选择从列名开始的值,在我的情况下是v,并且由于您的 JSON 是一个数组,因此我指定了第一个值[0],然后从那里展开,就像这样:

    SELECT v[0]:party[0].health FROM myvariant;

以上给了我:

enter image description here

对于其他行,您可以简单地执行:

SELECT v[0]:party[1].health FROM myvariant;
SELECT v[0]:party[2].health FROM myvariant;
SELECT v[0]:party[3].health FROM myvariant;

答案 1 :(得分:0)

另一种选择可能是让数据更像一个表格......我发现它比 JSON 更容易使用 :-)

底部的代码 - 只需复制/粘贴即可在 Snowflake 中运行并返回下面的屏幕截图。

Key Doco is Lateral Flatten

enter image description here

 SELECT  d4.path, d4.value 
 from  
 lateral flatten(input=>PARSE_JSON('[{ "party": [ {"class":"Farmer","gender":"Female","ethnicity":"NativeAmerican","health":2}, {"class":"Adventurer","gender":"Male","ethnicity":"White","health":3}, {"class":"Farmer","gender":"Male","ethnicity":"White","health":0}, {"class":"Banker","gender":"Female","ethnicity":"White","health":0} ] }]') ) as d  ,  
 lateral flatten(input=> value) as d2 ,  
 lateral flatten(input=> d2.value) as d3 ,  
 lateral flatten(input=> d3.value) as d4