Question

我正在尝试使用BigQuery python api从JSON文件加载记录。但是，当文件中有多个记录时，它会失败。

以下是我的json数据文件的样子

[{"queryID": "newId", "newCol": "newCol"},
 {"queryID": "newId", "newCol": "newCol"}]

这是相关代码

insert_request = bigquery.jobs().insert(
    projectId=project_id,
    body={
        'configuration': {
            'load': {
                'schema': {
                    'fields': simplejson.load(open(schema_path, 'r'))
                    },  
                'destinationTable': {
                    'projectId': project_id,
                    'datasetId': dataset_id,
                    'tableId': table_id
                    },  
                'sourceFormat': 'NEWLINE_DELIMITED_JSON',
                }   
            }   
        },  

    media_body=MediaFileUpload(
        './test_data.json',                                                                                                        
        mimetype='application/octet-stream'))

job = insert_request.execute()

这失败，错误JSON parsing error in row starting at position 0 at file: file-00000000. Start of array encountered without start of object.我认为是因为它无法将其识别为两行。

但是，如果我在test_data.json文件中只创建一条记录，则会成功加载。

{"queryID": "newId", "newCol": "newCol"}

我一直在查看insert docs，但无法找到允许我设置多行进行插入的选项。

任何机构都知道如何加载多条记录？任何带领的线索都表示赞赏。我觉得我错过了一些非常愚蠢的东西。感谢。

Answer 1

您的JSON数据文件应如下所示

{"queryID": "newId", "newCol": "newCol"}  
{"queryID": "newId", "newCol": "newCol"}

所以它应该不只是JSON而是newline delimited JSON
详细了解支持的JSON format

Answer 2

作为提示，您可以通过jq将您的json文件转换为预期的格式。

jq -c '.[]' input.json > new.json

或

curl -s https://local/xx.json | jq -c '.[]'

使用POST api将多个记录插入BigQuery

2 个答案: