在Bigquery中进行选择不会检索超过53000行的所有数据

时间:2018-11-07 02:38:49

标签: google-bigquery

bigquery中名为student_master的表有70000行,我想使用此查询来检索行。执行此操作时,我没有发现任何错误,但是,它只检索了52226行(意味着,不是全部)。我试图像这样的代码在partition_by上使用row_number(),但仍然没有获得所有数据。我该怎么办?

我正在使用使用两个查询order by id_studentlimit 35000,然后将asc(query1),desc(query2)用作查询的想法,但是如果数据增加(比如说200000行)。

data= []
sql = ( "SELECT id_student, class,name," +
        "   ROW_NUMBER() OVER (PARTITION BY class ORDER BY class ASC) row_num," +
        "FROM" +
        "   [project_name.dataset.student_master]" +
        "WHERE not class = " + element['class']
        )
query = client.run_sync_query(sql)
query.timeout_ms = 20000
query.run()
for row in query.rows:
    data.append(row)
return data

2 个答案:

答案 0 :(得分:1)

通常,对于大型导出,您应该运行导出作业,该作业会将您的数据放入GCS中的文件中。

但是在这种情况下,您可能只需要浏览更多页面的结果:

  

如果查询返回的行不适合初始响应,则我们需要通过fetch_data()获取其余行:

query = client.run_sync_query(LIMITED)
query.timeout_ms = TIMEOUT_MS
query.max_results = PAGE_SIZE
query.run()                     # API request

assert query.complete
assert query.page_token is not None
assert len(query.rows) == PAGE_SIZE
assert [field.name for field in query.schema] == ['name']

iterator = query.fetch_data()   # API request(s) during iteration
for row in iterator:
    do_something_with(row) 

答案 1 :(得分:1)

我能够通过查询公共数据集来收集200,000多行数据,并通过使用计数器变量进行了验证:

query_job = client.query("""
    SELECT ROW_NUMBER() OVER (PARTITION BY token_address ORDER BY token_address ASC) as row_number,token_address
    FROM `bigquery-public-data.ethereum_blockchain.token_transfers`
    WHERE token_address = '0x001575786dfa7b9d9d1324ec308785738f80a951'
    ORDER BY 1
    """)
contador = 0
for row in query_job:
    contador += 1
    print(contador,row)