使用pandas将值数组传递给bigquery查询

时间:2017-07-11 09:35:04

标签: python python-2.7 pandas google-bigquery

经过一些处理后,我得到以下数组:

users = array([u'5451709866311680', u'4660301072957440', u'6370791394377728',
   u'5121933955825664', u'4778500988862464', u'5841867648270336',
   u'4751430816628736', u'4869137213947904', u'5152642703556608',
   u'6531810976595968', u'4824167228637184', u'6058117842337792',
   u'5969360933879808', u'4764494160986112', u'5443041280131072',
   u'4846257587617792', u'5409371420884992', u'6197117949313024',
   u'6643644022915072', u'5060273861820416'], dtype=object)

然后我想在bigquery的另一个表中查询这个用户,但我遇到了问题。

query = """
SELECT  *
FROM games
WHERE user_id IN %users
"""
segment = pd.io.gbq.read_gbq(query, project_id='shared', dialect='standard)

任何人都知道如何继续?

谢谢

2 个答案:

答案 0 :(得分:0)

可能您的查询中存在问题,而不是熊猫问题。为了使此查询起作用,您必须执行以下操作:

query = """
SELECT  *
FROM crozzles.games
WHERE user_id IN UNNEST(['user1', 'user2', 'user3'])
"""

如果您没有UNNEST您的数组,那么BigQuery无法查找其inner values

你可以做的一件事就是:

query = """
SELECT  *
FROM crozzles.games
WHERE user_id IN UNNEST(%s)
""" %(map(str, users))

应该导致:

query = """SELECT  *
    FROM crozzles.games
    WHERE user_id IN UNNEST(['5451709866311680', '4660301072957440', '6370791394377728', '5121933955825664', '4778500988862464', '5841867648270336', '4751430816628736', '4869137213947904', '5152642703556608', '6531810976595968', '4824167228637184', '6058117842337792', '5969360933879808', '4764494160986112', '5443041280131072', '4846257587617792', '5409371420884992', '6197117949313024', '6643644022915072', '5060273861820416'])

答案 1 :(得分:0)

以下是使用开放数据集bigquery-public-data.github_repos的一种可能性:

from numpy import array
import pandas as pd

PROJEC_ID = 'choose-your-project-id'

input_array = array(['JavaScript', 'Python', 'R'], dtype=object)

query = """
SELECT lang.name, COUNT(*) AS count
FROM `bigquery-public-data.github_repos.languages`, UNNEST(language) AS lang
WHERE lang.name IN UNNEST(@lang_names)
GROUP BY 1
ORDER BY 2 DESC;
"""


query_config = {
    'query': {
        'parameterMode': 'NAMED',
        'queryParameters': [
            {
                'name': 'lang_names',
                'parameterType': {'type': 'ARRAY',
                                  'arrayType': {'type': 'STRING'}},
                'parameterValue': {'arrayValues': [{'value': i} for i in input_array]}
            }
        ]
    }
}

result = pd.io.gbq.read_gbq(query, project_id=PROJEC_ID, dialect='standard',
                            configuration=query_config)
print(result.to_string())

现在,结果为:

         name    count
0  JavaScript  1109499
1      Python   551257
2           R    29572

参考文献:

  1. https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/query#QueryRequest
  2. https://cloud.google.com/bigquery/docs/reference/rest/v2/QueryParameter