经过一些处理后,我得到以下数组:
users = array([u'5451709866311680', u'4660301072957440', u'6370791394377728',
u'5121933955825664', u'4778500988862464', u'5841867648270336',
u'4751430816628736', u'4869137213947904', u'5152642703556608',
u'6531810976595968', u'4824167228637184', u'6058117842337792',
u'5969360933879808', u'4764494160986112', u'5443041280131072',
u'4846257587617792', u'5409371420884992', u'6197117949313024',
u'6643644022915072', u'5060273861820416'], dtype=object)
然后我想在bigquery的另一个表中查询这个用户,但我遇到了问题。
query = """
SELECT *
FROM games
WHERE user_id IN %users
"""
segment = pd.io.gbq.read_gbq(query, project_id='shared', dialect='standard)
任何人都知道如何继续?
谢谢
答案 0 :(得分:0)
可能您的查询中存在问题,而不是熊猫问题。为了使此查询起作用,您必须执行以下操作:
query = """
SELECT *
FROM crozzles.games
WHERE user_id IN UNNEST(['user1', 'user2', 'user3'])
"""
如果您没有UNNEST
您的数组,那么BigQuery无法查找其inner values。
你可以做的一件事就是:
query = """
SELECT *
FROM crozzles.games
WHERE user_id IN UNNEST(%s)
""" %(map(str, users))
应该导致:
query = """SELECT *
FROM crozzles.games
WHERE user_id IN UNNEST(['5451709866311680', '4660301072957440', '6370791394377728', '5121933955825664', '4778500988862464', '5841867648270336', '4751430816628736', '4869137213947904', '5152642703556608', '6531810976595968', '4824167228637184', '6058117842337792', '5969360933879808', '4764494160986112', '5443041280131072', '4846257587617792', '5409371420884992', '6197117949313024', '6643644022915072', '5060273861820416'])
答案 1 :(得分:0)
以下是使用开放数据集bigquery-public-data.github_repos
的一种可能性:
from numpy import array
import pandas as pd
PROJEC_ID = 'choose-your-project-id'
input_array = array(['JavaScript', 'Python', 'R'], dtype=object)
query = """
SELECT lang.name, COUNT(*) AS count
FROM `bigquery-public-data.github_repos.languages`, UNNEST(language) AS lang
WHERE lang.name IN UNNEST(@lang_names)
GROUP BY 1
ORDER BY 2 DESC;
"""
query_config = {
'query': {
'parameterMode': 'NAMED',
'queryParameters': [
{
'name': 'lang_names',
'parameterType': {'type': 'ARRAY',
'arrayType': {'type': 'STRING'}},
'parameterValue': {'arrayValues': [{'value': i} for i in input_array]}
}
]
}
}
result = pd.io.gbq.read_gbq(query, project_id=PROJEC_ID, dialect='standard',
configuration=query_config)
print(result.to_string())
现在,结果为:
name count
0 JavaScript 1109499
1 Python 551257
2 R 29572
参考文献: