我有一个看起来像
的json记录{"customer_id":"2349uslvn2q3","order_id":"9sufd23rdl40",
"line_item": [{"line":"1","sku":"10","amount":10},
{"line":"2","sku":"20","amount":20}]}
我正在尝试将上述记录加载到具有架构定义的表中,
"fields": [
{
"mode": "NULLABLE",
"name": "customer_id",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "order_id",
"type": "STRING"
},
{
"mode": "REPEATED",
"name": "line_item",
"type": "STRING"
}
]
我收到以下错误"消息":
从文件位置0开始的行中的JSON解析错误: GS:// gcs_bucket / file0。为非记录字段指定的JSON对象: LINE_ITEM
我希望line_item
json字符串在表格的行项目列中可以有超过1行作为json字符串数组。
有什么建议吗?
答案 0 :(得分:1)
首先,你的输入JSON不应该有" \ n"字符,所以你应该保存它:
{"customer_id":"2349uslvn2q3","order_id":"9sufd23rdl40", "line_item": [{"line":"1","sku":"10","amount":10}, {"line":"2","sku":"20","amount":20}]}
JSON文件应如何显示的一个示例:
{"customer_id":"2349uslvn2q3","order_id":"9sufd23rdl40", "line_item": [{"line":"1","sku":"10","amount":10}, {"line":"2","sku":"20","amount":20}]}
{"customer_id":"2","order_id":"2", "line_item": [{"line":"2","sku":"20","amount":20}, {"line":"2","sku":"20","amount":20}]}
{"customer_id":"3","order_id":"3", "line_item": [{"line":"3","sku":"30","amount":30}, {"line":"3","sku":"30","amount":30}]}
而且您的架构也不正确。它应该是:
[
{
"mode": "NULLABLE",
"name": "customer_id",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "order_id",
"type": "STRING"
},
{
"mode": "REPEATED",
"name": "line_item",
"type": "RECORD",
"fields": [{"name": "line", "type": "STRING"}, {"name": "sku", "type": "STRING"}, {"name": "amount", "type": "INTEGER"}]
}
]
为了更好地理解模式的工作原理,我尝试在this answer中编写一些指南。希望它具有一定的价值。
如果您的数据内容保存在例如名为gs://gcs_bucket/file0
的字段中,而您的架构保存在schema.json
中,那么此命令应该适用于您:
bq load --source_format=NEWLINE_DELIMITED_JSON dataset.table gs://gcs_bucket/file0 schema.json
(假设您正在使用CLI工具,因为您的问题似乎就是这种情况)。