我已经为此苦苦挣扎了很长时间。我有下表...
邮政编码 | 家庭 | 怀特电缆 | Wight 电缆最快下降 | B4RN | B4RN 最快下降 |
---|---|---|---|---|---|
X24 888 | 34 | 1 | 108.2 | 0 | 0 |
BT36 7JU | 17 | 0 | 0 | 1 | 274.23 |
我想要做的是输出如下 JSON(此处为一行)
{'postcode':"X24 888",
'households':34,
'providers':[{'name':"wight cable",
'fastest_down':108.2,
'present':1},
{'name':"B4RN",
'fastest_down':0,
'present':0}']
}
实际上大约有 50 个这样的列。我可以使用 SHOW COLUMNS LIKE '%fastest down%' IN TABLE TABLE1
之类的命令获取列表,但我正在努力遍历表以从这些表中捕获数据,以及特定的嵌套 json 结构。
这是我到目前为止所得到的。也许此时只需要进行一些小的编辑即可获得我需要的内容。
create or replace function custom_object_assign(o1 VARIANT, o2 VARIANT)
returns VARIANT
language javascript
as 'return Object.assign(O1, O2);';
with t1 AS
(
SELECT OBJECT_CONSTRUCT(
'_id', h."ID",
'providers', array_agg(object_construct(
'wight cable', h."wight cable",
'wight cable fastest down', h."wight cable fastest down")
)) AS dc
FROM TABLE1 h
group by "ID"
),
t2 AS
(SELECT OBJECT_CONSTRUCT(
'_id', h."ID",
'rmpostcode', "rmpostcode"
) AS rs
FROM TABLE1 h
)
SELECT custom_object_assign(dc, rs)
FROM t1
JOIN t2
ON rs:"_id" = dc:"_id"
LIMIT 10;
返回诸如
之类的东西{ "_id": "786516", "providers": [ { "wight cable": 0 } ], "rmpostcode": "LL65 1SJ" }
还不是我需要的!
答案 0 :(得分:2)
这是一个使用存储过程读取源表并向目标表添加行的示例。您可以在代码中看到定义源表和目标表的常量。
限制:现在代码期望邮政编码、家庭和随后的“财产”+“财产最快下降”列处于有序的序数位置。如果情况并非如此,则需要有一个循环来读取列名称以查找这些列在序数位置的位置。
使用说明:JavaScript 正在构建自定义 JSON。由于 Snowflake 不喜欢单行插入,因此它构建了一个 JSON 数组并将数组展平以一次插入 1000 行。有一个常量用于设置行缓冲区。如果 JSON 超过 16MB,它将失败,因此如果发生这种情况,可能需要向下调整。
建议:如果您想将其生产化,您可能需要从源表创建一个流表。使用此 SP 仅处理新行将变得更加容易。 https://snowflake.pavlik.us/index.php/2020/01/12/snowflake-streams-made-simple。需要修改代码以忽略最后三个元数据列,然后将 DML 运行到无处(从 stream_table 插入 target_table,其中 1 = 0)以推进流。
这是填充目标表以开始使用的 SP 代码。
create or replace table FLAT_VALUES
(
"postcode" string
,"households" int
,"wight cable" int
,"wight cable fastest down" float
,"B4RN" int
,"B4RN fastest down" float
)
;
insert into FLAT_VALUES select 'X24 888', 34, 1, 108.2, 0, 0;
insert into FLAT_VALUES select 'BT36 7JU', 17, 0, 0, 1, 274.23;
create or replace table NESTED_JSON(v variant);
create or replace procedure GET_NESTED_JSON()
returns variant
language javascript
as
$$
var out = {};
const INPUT_TABLE = "FLAT_VALUES";
const OUTPUT_TABLE = "NESTED_JSON";
const INSERT_BUFFER_ROWS = 1000;
class Query{
constructor(statement){
this.statement = statement;
}
}
var selectSQL = `select * from ${INPUT_TABLE}`;
var json = [];
var row = {};
var colArr = [];
var subRow = {};
var rowBuffer = 0;
var rowsInserted = 0;
var selectQuery = getQuery(selectSQL);
while (selectQuery.resultSet.next()) {
row = {};
rowBuffer++;
row[selectQuery.statement.getColumnName(1)] = selectQuery.resultSet.getColumnValue(1);
row[selectQuery.statement.getColumnName(2)] = selectQuery.resultSet.getColumnValue(2);
colArr = [];
for(col = 3; col <= selectQuery.statement.getColumnCount(); col = col + 2) {
subRow = {};
subRow["name"] = selectQuery.statement.getColumnName(col);
subRow["fastest_down"] = selectQuery.resultSet.getColumnValue(selectQuery.statement.getColumnName(col) + " fastest down");
subRow["present"] = selectQuery.resultSet.getColumnValue(selectQuery.statement.getColumnName(col));
colArr.push(subRow);
}
row["providers"] = colArr;
json.push(row);
if (rowBuffer == INSERT_BUFFER_ROWS) {
rowBuffer = 0;
insertRows(OUTPUT_TABLE, json);
rowsInserted += rowBuffer;
}
}
if (rowBuffer > 0) {
insertRows(OUTPUT_TABLE, json);
rowsInserted += rowBuffer;
}
out["ROWS_INSERTED"] = rowsInserted;
return out;
function insertRows(targetTable, json) {
var jsonString = JSON.stringify(json);
var insertSQL = `insert into NESTED_JSON select VALUE from table(flatten(parse_json('${jsonString}')))`;
var insertQuery = getQuery(insertSQL);
}
function getQuery(sql){
var cmd = {sqlText: sql};
var query = new Query(snowflake.createStatement(cmd));
try {
query.resultSet = query.statement.execute();
} catch (e) {
query.error = e.message;
}
return query;
}
$$;
call get_nested_json();
select * from NESTED_JSON;