在bigquery python API中截断的字节字段

时间:2017-10-03 16:26:35

标签: python-2.7 google-bigquery google-cloud-platform google-python-api

我正在尝试使用python查询BigQuery中的表,其中包含文档中提供的示例代码:

query_job = bq_client.run_async_query(str(uuid.uuid4()), sql_query)
query_job.use_legacy_sql = False
query_job.begin()
query_job.result()
destination_table = query_job.destination
destination_table.reload()
query_result = destination_table.fetch_data()
for row in query_result:
    print row

我得到的结果如下:

('?S\x9e\xbe\x9b\xadB\xd8\x92**5\xcck\xee]', 15, 28, 28, datetime.date(2017, 9, 5))
('%SwP\xe6xMK\x99T\xa1\x7f\xbbk>\xff', 15, 45, 19, datetime.date(2017, 9, 21))
('\xd7\x99>\x05(\x94M\xd8\x92\x0e\xe77\x8b\xcc\x08\xf0', 15, 18, 18, datetime.date(2017, 9, 4))
('\x0f+\xe7\xe0\xba5Ei\xa3\xb0\xfd\xd8\x1a\xa2wy', 15, 16, 16, datetime.date(2017, 9, 21))
('\xa3C\x0b\xdb\xebqM\xa1\xb3f\r\x8f#\x85\x93<', 15, 20, 16, datetime.date(2017, 9, 21))
(':\x1cZ?\x13\xf1A\xf5\x8e\xba\xfeYL.|v', 15, 15, 15, datetime.date(2017, 9, 6))
('\x1bP\x88\xd7\x1a\xfbGC\xb5$\x10\x97gx<\xb2', 15, 15, 15, datetime.date(2017, 9, 19))
('(B\xbc\xb7\xe9\xc0D\x89\xb0\x82jfW;,\x1e', 15, 18, 14, datetime.date(2017, 9, 19))
('\xd8\xbaw\x88\x89<Oh\x81]v\xa8!-\x83\x7f', 15, 17, 13, datetime.date(2017, 9, 6))
("\x94\x1f'\xf1\xd1$C\x9b\xb4o\x81H\x17\xf4\xa5S", 10, 14, 12, datetime.date(2017, 9, 5))
('\x949\x17\xbf\x90\xd7L\x04\x98\xe9+5\x9d\x1a\xb4\xe4', 15, 12, 12, datetime.date(2017, 9, 21))

第一个字段是一个字节类型,应该是b64 uuid。表中的实际结果,UI,java包装器和命令行(具有相同的查询)是:

P1OevputQtiSKio1zGvuXQ==    15  28  28  2017-09-05
JVN3UOZ4TUuZVKF/u2s+/w==    15  45  19  2017-09-21
15k+BSiUTdiSDuc3i8wI8A==    15  18  18  2017-09-04
o0ML2+txTaGzZg2PI4WTPA==    15  20  16  2017-09-21
Dyvn4Lo1RWmjsP3YGqJ3eQ==    15  16  16  2017-09-21
G1CI1xr7R0O1JBCXZ3g8sg==    15  15  15  2017-09-19
OhxaPxPxQfWOuv5ZTC58dg==    15  15  15  2017-09-06
KEK8t+nARImwgmpmVzssHg==    15  18  14  2017-09-19
2Lp3iIk8T2iBXXaoIS2Dfw==    15  17  13  2017-09-06
lB8n8dEkQ5u0b4FIF/SlUw==    10  14  12  2017-09-05
FJZXBYCAR8mQQwEeuuKKhQ==    15  12  12  2017-09-19
APVxsNTHSrCSU0z6QWdXSw==    15  12  12  2017-09-05
lDkXv5DXTASY6Ss1nRq05A==    15  12  12  2017-09-21

我似乎在python答案中没有完整的字节数组,而我在每个其他包装器中都得到它。 这是查询:

SELECT *
                FROM (SELECT c.rich_log.user_id,
                      w.topic_id, count(*) as event_count,
                      count(CASE WHEN c.header.timestamp > (UNIX_MILLIS(CURRENT_TIMESTAMP()) - 2506000000)
                                  then 1 else null end) as event_count_in_lookback_window,
                      date(TIMESTAMP_MILLIS(max(c.header.timestamp))) as last_event
                FROM `table1_*`  as c
                join `table2` as w on w.id = c.rich_log.website_id
                where w.topic_id in {}
                and c.header.timestamp > UNIX_MILLIS(CURRENT_TIMESTAMP()) - 3456000000
                group by c.rich_log.user_id, w.topic_id)
                WHERE event_count_in_lookback_window > 0
                ORDER BY event_count_in_lookback_window DESC
                LIMIT 100

有谁知道为什么会这样?我的字节字段似乎被截断了。

谢谢,

安东

1 个答案:

答案 0 :(得分:0)

此处没有任何内容被截断。相反,python显示原始二进制数据,而其他接口是base64为您编码数据。显示结果在两种格式中是等效的:

$ printf '?S\x9e\xbe\x9b\xadB\xd8\x92**5\xcck\xee]' | base64
P1OevputQtiSKio1zGvuXQ==

当您的UUID存储在BYTES字段中时,会发生这种情况。这是存储此类数据的最有效和最正确的方法,但确实意味着不同的系统将默认以不同的方式转换为字符串。

在python中,如果你愿意,有一个base64 module会转换为base64字符串。

Python还有一个uuid module可用于将字节直接更改为UUID对象:

>>> import uuid
>>> print uuid.UUID(bytes=b'?S\x9e\xbe\x9b\xadB\xd8\x92**5\xcck\xee]')
3f539ebe-9bad-42d8-922a-2a35cc6bee5d