Bigquery无法从Google云端存储加载数据

时间:2017-07-28 21:35:14

标签: google-bigquery

我有一个看起来像

的json记录
{"customer_id":"2349uslvn2q3","order_id":"9sufd23rdl40",
 "line_item": [{"line":"1","sku":"10","amount":10},
               {"line":"2","sku":"20","amount":20}]}

我正在尝试将上述记录加载到具有架构定义的表中,

"fields": [
  {
    "mode": "NULLABLE",
    "name": "customer_id",
    "type": "STRING"
  },
  {
    "mode": "NULLABLE",
    "name": "order_id",
    "type": "STRING"
  },
  {
    "mode": "REPEATED",
    "name": "line_item",
    "type": "STRING"
  }
]

我收到以下错误"消息":

  

从文件位置0开始的行中的JSON解析错误:   GS:// gcs_bucket / file0。为非记录字段指定的JSON对象:   LINE_ITEM

我希望line_item json字符串在表格的行项目列中可以有超过1行作为json字符串数组。

有什么建议吗?

1 个答案:

答案 0 :(得分:1)

首先,你的输入JSON不应该有" \ n"字符,所以你应该保存它:

{"customer_id":"2349uslvn2q3","order_id":"9sufd23rdl40", "line_item": [{"line":"1","sku":"10","amount":10}, {"line":"2","sku":"20","amount":20}]}

JSON文件应如何显示的一个示例:

{"customer_id":"2349uslvn2q3","order_id":"9sufd23rdl40", "line_item": [{"line":"1","sku":"10","amount":10}, {"line":"2","sku":"20","amount":20}]}
{"customer_id":"2","order_id":"2", "line_item": [{"line":"2","sku":"20","amount":20}, {"line":"2","sku":"20","amount":20}]}
{"customer_id":"3","order_id":"3", "line_item": [{"line":"3","sku":"30","amount":30}, {"line":"3","sku":"30","amount":30}]}

而且您的架构也不正确。它应该是:

[
  {
    "mode": "NULLABLE",
    "name": "customer_id",
    "type": "STRING"
  },
  {
    "mode": "NULLABLE",
    "name": "order_id",
    "type": "STRING"
  },
  {
    "mode": "REPEATED",
    "name": "line_item",
    "type": "RECORD",
    "fields": [{"name": "line", "type": "STRING"}, {"name": "sku", "type": "STRING"}, {"name": "amount", "type": "INTEGER"}]
  }
]

为了更好地理解模式的工作原理,我尝试在this answer中编写一些指南。希望它具有一定的价值。

如果您的数据内容保存在例如名为gs://gcs_bucket/file0的字段中,而您的架构保存在schema.json中,那么此命令应该适用于您:

bq load --source_format=NEWLINE_DELIMITED_JSON dataset.table gs://gcs_bucket/file0 schema.json

(假设您正在使用CLI工具,因为您的问题似乎就是这种情况)。