Bigquery事件流 - 错误400

时间:2016-03-03 13:41:15

标签: python google-bigquery

我已经设置了以下代码:

def send_to_bq(bigquery, events, table_id):
    try:    
        data = {}
        data['rows'] = []
        data['skipInvalidRows'] = True # don't drop the entire batch if there's a bad record
        data['ignoreUnknownValues'] = True # ignore unknown fields

        for event in events:
            row = {
                'json': event,
                # Generate a unique id for each row so retries don't accidentally
                # duplicate insert
                'insertId': str(uuid.uuid4()),
            }
            data['rows'].append(row)
        if len(data['rows']) > 0:
            #print "request: " + json.dumps(data)            
            return bigquery.tabledata().insertAll(
                projectId=config['FUNTOMIC_PROJECTID'],
                datasetId=config['DATASET_ID'],
                tableId=table_id,
                body=data).execute(num_retries=int(config['CHUNK_RETRIES']))
        else:
            return 'Empty Event'
    except Exception as e:
        print str(e)

我正在拖尾日志文件并将数据发送到BQ。每隔几次迭代,随机抛出以下异常:

<HttpError 400 when requesting https://www.googleapis.com/bigquery/v2/projects/PROJECT_ID/datasets/DATASET_ID/tables/TABLE_ID/insertAll?alt=json returned "Parse Error">

有时一天可能会有几次,有时几秒钟。 我不知道发生了什么,我在BQ流媒体文档中找不到任何内容。

我试图了解是否在其中一次重试(我可以安全地忽略)上发生这种情况 - 服务器错误,或者如果仅打印退休已经用尽(在这种情况下我可能会丢失事件) )。

谢谢!

修改1 将我的块大小更改为1,并打印了这样的事件。该事件是有效的JSON。在BQ中验证它没有进入。

{"is_synced": "False", "domain": "kiziland", "server_time": "1457116902", "event_type": "creature_bought", "ip": "151.62.108.127", "partial_data": "True", "agent": "Mozilla/5.0 (Android; U; it-IT) AppleWebKit/533.19.4 (KHTML, like Gecko) AdobeAIR/19.0", "currency": "coins", "elapsed_play_time": "43536", "received_at": "1457116902893", "is_converted": "False", "city": "Trento", "uuid": "tZkUiABW6J5t", "coins_left": 266057442087380840000, "platform": "Android", "is_in_kizi_app": "False", "advertising_id": "f3cb67f6-c631-4a63-bd20-824ca8317eda", "creature_level": "39", "game_version": "1.1.11", "is_in_kizi_mobile_web": "False", "index": "mobile_games", "price": "1.15280492432e+21", "stars_left": "315", "current_max_creature": "49", "event_stream_time": 1457257312.162035, "day": "2016-03-04", "sourcetype": "mobile_events", "original_version": "1.1.11", "is_native": "True", "country": "IT", "install_date": "1453487933", "session_id": "FoKr8DwFAtikNc0X2X0P", "_time": "1457116902", "game_ops_version": "0.7.5", "host_type": "android_native_app", "is_in_kizi_web": "False"}

编辑2 - 解决方案 显然,当我将我的python字典转换为json事件时(需要对某些类型进行特殊处理)我没有处理“长”类型。它们是例外的原因。

1 个答案:

答案 0 :(得分:0)

由于您的有效负载请求中的字符无效,看起来BigQuery无法解析请求正文。

如果您尝试验证它[1],您将看到这是一个无效的JSON。看起来问题是找到无效字符的“代理”。

这是提供的代理人:

"agent":"Mozilla/5.0 (Android; U; it-IT) AppleWebKit/533.19.4
(KHTML, like Gecko) AdobeAIR/19.0"

如您所见,AppleWebKit / 533.19.4之后有一个换行符,您需要删除或编码如下:

“agent":"Mozilla/5.0 (Android; U; it-IT) AppleWebKit/533.19.4\n(KHTML, like Gecko) AdobeAIR/19.0”

[1] http://jsonlint.com/