在我们的数据中,我们有json字段,包括重复的部分,以及无限的嵌套可能性(到目前为止,我的样本非常简单)。在看到BQ重复的字段和记录之后,我决定尝试将数据重组为重复记录字段,因为我们的用例与分析相关,然后想要测试数据的不同用例以查看哪种方法更有效(时间/我们打算对其进行分析的成本/难度。我创建了一个示例json记录,我想上传到BQ,它使用我认为我们需要的所有功能(我已经验证使用http://jsonlint.com/):
{
"aid": "6dQcrgMVS0",
"hour": "2016042723",
"unixTimestamp": "1461814784",
"browserId": "BdHOHp2aL9REz9dXVeKDaxdvefE3Bgn6NHZcDQKeuC67vuQ7PBIXXJda3SOu",
"experienceId": "EXJYULQOXQ05",
"experienceVersion": "1.0",
"pageRule": "V1XJW61TPI99UWR",
"userSegmentRule": "67S3YVMB7EMQ6LP",
"branch": [{
"branchId": "1",
"branchType": "userSegments",
"itemId": "userSegment67S3YVMB7EMQ6LP",
"headerId": "null",
"itemMethod": "null"
}, {
"branchId": "1",
"branchType": "userSegments",
"itemId": "userSegment67S3YVMB7EMQ6LP",
"headerId": "null",
"itemMethod": "null"
}],
"event": [{
"eventId": "546",
"eventName": "testEvent",
"eventDetails": [{
"key": "a",
"value": "1"
}, {
"key": "b",
"value": "2"
}, {
"key": "c",
"value": "3"
}]
}, {
"eventId": "547",
"eventName": "testEvent2",
"eventDetails": [{
"key": "d",
"value": "4"
}, {
"key": "e",
"value": "5"
}, {
"key": "f",
"value": "6"
}]
}]
}
我正在使用BQ接口,将此json上传到具有以下结构的表中:
[
{
"name": "aid",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "hour",
"type": "INTEGER",
"mode": "NULLABLE"
},
{
"name": "unixTimestamp",
"type": "INTEGER",
"mode": "NULLABLE"
},
{
"name": "browserId",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "experienceId",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "experienceVersion",
"type": "FLOAT",
"mode": "NULLABLE"
},
{
"name": "pageRule",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "userSegmentRule",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "branch",
"type": "RECORD",
"mode": "REPEATED",
"fields": [
{
"name": "branchId",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "branchType",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "itemId",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "headerId",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "itemMethod",
"type": "STRING",
"mode": "NULLABLE"
}
]
},
{
"name": "event",
"type": "RECORD",
"mode": "REPEATED",
"fields": [
{
"name": "evenId",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "eventName",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "eventDetails",
"type": "RECORD",
"mode": "REPEATED",
"fields": [
{
"name": "key",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "value",
"type": "STRING",
"mode": "NULLABLE"
}
]
}
]
}
]
我的工作失败了
JSON parsing error in row starting at position 0 in <file_id>. Expected key (error code: invalid)
我可能无法在表中进行多次嵌套,但错误似乎更像是解析JSON本身的问题。我能够使用简单的重复记录生成并成功导入json(参见下面的示例):
{
"eventId": "546",
"eventName": "testEvent",
"eventDetails": [{
"key": "a",
"value": "1"
}, {
"key": "b",
"value": "2"
}, {
"key": "c",
"value": "3"
}]
}
感谢任何建议。
答案 0 :(得分:2)
您的架构似乎没有任何问题,因此BigQuery应该能够使用您的架构加载数据。
首先,确保您上传newline-delimited JSON to BigQuery。您的示例行在JSON行的中间有许多换行符,并且解析器正在尝试将每一行解释为单独的JSON行。
其次,看起来您的架构在“事件”记录中具有键“evenId”,但您的示例行具有键“eventId”。