我正在尝试使用python查询BigQuery中的表,其中包含文档中提供的示例代码:
query_job = bq_client.run_async_query(str(uuid.uuid4()), sql_query)
query_job.use_legacy_sql = False
query_job.begin()
query_job.result()
destination_table = query_job.destination
destination_table.reload()
query_result = destination_table.fetch_data()
for row in query_result:
print row
我得到的结果如下:
('?S\x9e\xbe\x9b\xadB\xd8\x92**5\xcck\xee]', 15, 28, 28, datetime.date(2017, 9, 5))
('%SwP\xe6xMK\x99T\xa1\x7f\xbbk>\xff', 15, 45, 19, datetime.date(2017, 9, 21))
('\xd7\x99>\x05(\x94M\xd8\x92\x0e\xe77\x8b\xcc\x08\xf0', 15, 18, 18, datetime.date(2017, 9, 4))
('\x0f+\xe7\xe0\xba5Ei\xa3\xb0\xfd\xd8\x1a\xa2wy', 15, 16, 16, datetime.date(2017, 9, 21))
('\xa3C\x0b\xdb\xebqM\xa1\xb3f\r\x8f#\x85\x93<', 15, 20, 16, datetime.date(2017, 9, 21))
(':\x1cZ?\x13\xf1A\xf5\x8e\xba\xfeYL.|v', 15, 15, 15, datetime.date(2017, 9, 6))
('\x1bP\x88\xd7\x1a\xfbGC\xb5$\x10\x97gx<\xb2', 15, 15, 15, datetime.date(2017, 9, 19))
('(B\xbc\xb7\xe9\xc0D\x89\xb0\x82jfW;,\x1e', 15, 18, 14, datetime.date(2017, 9, 19))
('\xd8\xbaw\x88\x89<Oh\x81]v\xa8!-\x83\x7f', 15, 17, 13, datetime.date(2017, 9, 6))
("\x94\x1f'\xf1\xd1$C\x9b\xb4o\x81H\x17\xf4\xa5S", 10, 14, 12, datetime.date(2017, 9, 5))
('\x949\x17\xbf\x90\xd7L\x04\x98\xe9+5\x9d\x1a\xb4\xe4', 15, 12, 12, datetime.date(2017, 9, 21))
第一个字段是一个字节类型,应该是b64 uuid。表中的实际结果,UI,java包装器和命令行(具有相同的查询)是:
P1OevputQtiSKio1zGvuXQ== 15 28 28 2017-09-05
JVN3UOZ4TUuZVKF/u2s+/w== 15 45 19 2017-09-21
15k+BSiUTdiSDuc3i8wI8A== 15 18 18 2017-09-04
o0ML2+txTaGzZg2PI4WTPA== 15 20 16 2017-09-21
Dyvn4Lo1RWmjsP3YGqJ3eQ== 15 16 16 2017-09-21
G1CI1xr7R0O1JBCXZ3g8sg== 15 15 15 2017-09-19
OhxaPxPxQfWOuv5ZTC58dg== 15 15 15 2017-09-06
KEK8t+nARImwgmpmVzssHg== 15 18 14 2017-09-19
2Lp3iIk8T2iBXXaoIS2Dfw== 15 17 13 2017-09-06
lB8n8dEkQ5u0b4FIF/SlUw== 10 14 12 2017-09-05
FJZXBYCAR8mQQwEeuuKKhQ== 15 12 12 2017-09-19
APVxsNTHSrCSU0z6QWdXSw== 15 12 12 2017-09-05
lDkXv5DXTASY6Ss1nRq05A== 15 12 12 2017-09-21
我似乎在python答案中没有完整的字节数组,而我在每个其他包装器中都得到它。 这是查询:
SELECT *
FROM (SELECT c.rich_log.user_id,
w.topic_id, count(*) as event_count,
count(CASE WHEN c.header.timestamp > (UNIX_MILLIS(CURRENT_TIMESTAMP()) - 2506000000)
then 1 else null end) as event_count_in_lookback_window,
date(TIMESTAMP_MILLIS(max(c.header.timestamp))) as last_event
FROM `table1_*` as c
join `table2` as w on w.id = c.rich_log.website_id
where w.topic_id in {}
and c.header.timestamp > UNIX_MILLIS(CURRENT_TIMESTAMP()) - 3456000000
group by c.rich_log.user_id, w.topic_id)
WHERE event_count_in_lookback_window > 0
ORDER BY event_count_in_lookback_window DESC
LIMIT 100
有谁知道为什么会这样?我的字节字段似乎被截断了。
谢谢,
安东
答案 0 :(得分:0)
此处没有任何内容被截断。相反,python显示原始二进制数据,而其他接口是base64为您编码数据。显示结果在两种格式中是等效的:
$ printf '?S\x9e\xbe\x9b\xadB\xd8\x92**5\xcck\xee]' | base64
P1OevputQtiSKio1zGvuXQ==
当您的UUID存储在BYTES字段中时,会发生这种情况。这是存储此类数据的最有效和最正确的方法,但确实意味着不同的系统将默认以不同的方式转换为字符串。
在python中,如果你愿意,有一个base64 module会转换为base64字符串。
Python还有一个uuid module可用于将字节直接更改为UUID对象:
>>> import uuid
>>> print uuid.UUID(bytes=b'?S\x9e\xbe\x9b\xadB\xd8\x92**5\xcck\xee]')
3f539ebe-9bad-42d8-922a-2a35cc6bee5d