我需要用最大数量展平结果中的概率列:
original predicted probabilities
<=50K >50K >50K 0.5377828170971353
<=50K 0.46221718290286473
<=50K <=50K >50K 0.05434716579642335
<=50K 0.9456528342035766
我想弄平结果,但是现在使用此查询,我只得到上面的表,而使用bigQuery Python客户端得到了一个:[object Object],[object Object]
SELECT
original,
predicted,
probabilities
FROM
ML.PREDICT(MODEL `my_dataset.my_model`,
(
SELECT
*
FROM
`bigquery-public-data.ml_datasets.census_adult_income`
))
答案 0 :(得分:1)
您的probabilities
字段是重复记录,即结构数组。您可以使用子查询来遍历数组并选择最大概率,如下所示:
SELECT
original,
predicted,
(SELECT p
-- Iterate over the array
FROM UNNEST(probabilities) as p
-- Order by probability and get the first result
ORDER BY p.prob DESC
LIMIT 1) AS probabilities
FROM
ML.PREDICT(MODEL `my_dataset.my_model`,
(
SELECT
*
FROM
`bigquery-public-data.ml_datasets.census_adult_income`
))
结果将如下所示:
您获得的python结果看起来更像是对象的javascript表示形式。这是我在python中完成的方法:
from google.cloud import bigquery
client = bigquery.Client()
# Perform a query.
sql = ''' SELECT ... ''' # Your query
query_job = client.query(sql)
rows = query_job.result() # Waits for query to finish
for row in rows:
print(row.values())
输出:
(' >50K', ' >50K', {'label': ' >50K', 'prob': 0.5218586871072727})
(' >50K', ' >50K', {'label': ' >50K', 'prob': 0.5907989087876587})
(' >50K', ' >50K', {'label': ' >50K', 'prob': 0.734145221825564})
请注意,概率是BigQuery SQL中的结构数据类型,因此将其映射为python字典。
检查BigQuery quickstart以获得有关客户端库的更多信息。