使用Python 2.7,我不会将查询从BigQuery传递给具有specific formating请求的ML Predict。
首先:是否有更简单的方法直接从BigQuery查询转换为正确格式的JSON,因此可以将其传递给requests.post()
而不是通过pandas(据我所知,pandas仍然不支持GCP标准)?
第二:有没有办法构建查询以直接转换为JSON格式,然后修改JSON以反映ML Predict JSON要求?
目前我的代码如下:
#I used the bigquery to dataframe option here to view the output.
#I would like to not use pandas in the end code.
logs = log_data.execute(output_options=bq.QueryOutput.dataframe()).result()
data = logs.to_json(orient='index')
print data
'{"0":{"end_time":"2018-04-19","device":"iPad","device_os":"iOS","device_os_version":"5.1.1","latency":0.150959,"megacycles":140.0,"cost":"1.3075e-08","device_brand":"Apple","device_family":"iPad","browser_version":"5.1","app":"567","ua_parse":"0"}}'
#The JSON needs to be in this format according to google documentation.
#data = {
# 'instances': [
# {
# 'key':'',
# 'end_time': '2018-04-19',
# 'device': 'iPad',
# 'device_os': 'iOS',
# 'device_os_version': '5.1.1',
# 'latency': 0.150959,
# 'megacycles':140.0,
# 'cost':'1.3075e-08',
# 'device_brand':'Apple',
# 'device_family':'iPad',
# 'browser_version':'5.1',
# 'app':'567',
# 'ua_parse':'40.9.8'
# }
# ]
#}
所以我需要更改的是前导键'0'
到'instances'
,我应该全部设置为传入`requests.post()。
有没有办法实现这个目标?
编辑 - 添加BigQuery查询:
%%bq query --n log_data
WITH `my.table` AS (
SELECT ARRAY<STRUCT<end_time STRING, device STRING, device_os STRING, device_os_version STRING, latency FLOAT64, megacycles FLOAT64,
cost STRING, device_brand STRING, device_family STRING, browser_version STRING, app STRING, ua_parse STRING>>[] instances
)
SELECT TO_JSON_STRING(t)
FROM `my.table` AS t
WHERE end_time >='2018-04-19'
LIMIT 1
data = log_data.execute().result()
感谢@MikhailBerlyant我调整了我的查询和代码,看起来像这样:
%%bq query --n log_data
SELECT [TO_JSON_STRING(t)] AS instance
FROM `yourproject.yourdataset.yourtable` AS t
WHERE end_time >='2018-04-19'
LIMIT 1
但是当我运行执行logs = log_data.execute().result()
时,我得到了这个
传递到request.post
TypeError: QueryResultsTable job_zfVEiPdf2W6msBlT6bBLgMusF49E is not JSON serializable
在execut()中有没有办法只返回json?
答案 0 :(得分:1)
首先:是否有更简单的方法直接从BigQuery查询转到正确格式的JSON
见下面的例子
#standardSQL
WITH yourTable AS (
SELECT ARRAY<STRUCT<id INT64, type STRING>>[(1, 'abc'), (2, 'xyz')] instances
)
SELECT TO_JSON_STRING(t)
FROM yourTable t
结果是您要求的格式:
{"instances":[{"id":1,"type":"abc"},{"id":2,"type":"xyz"}]}
上面演示了查询及其工作方式 在你真实的情况下 - 你应该使用类似下面的东西
SELECT TO_JSON_STRING(t)
FROM `yourproject.yourdataset.yourtable` AS t
WHERE end_time >='2018-04-19'
LIMIT 1
希望这会有所帮助:o)
根据评论进行更新
SELECT [TO_JSON_STRING(t)] AS instance
FROM `yourproject.yourdataset.yourtable` t
WHERE end_time >='2018-04-19'
LIMIT 1
答案 1 :(得分:0)
我想添加这个,以防有人遇到同样的问题,或者至少有问题,一旦你有了查询就去了。
我能够编写一个函数,以Google ML Predict希望将其传递给requests.post()的方式格式化查询。这很可能是实现这一目标的可怕方法,但我无法找到以正确格式从BigQuery到ML Predict的直接方式。
def logs(query):
client = gcb.Client()
query_job = client.query(query)
CSV_COLUMNS ='end_time,device,device_os,device_os_version,latency,megacycles,cost,device_brand,device_family,browser_version,app,ua_parse'.split(',')
for row in query_job.result():
var = list(row)
l1 = dict(zip(CSV_COLUMNS,var))
l1.update({'key':''})
l2 = {'instances':[l1]}
return l2