bigquery中名为student_master的表有70000行,我想使用此查询来检索行。执行此操作时,我没有发现任何错误,但是,它只检索了52226行(意味着,不是全部)。我试图像这样的代码在partition_by上使用row_number(),但仍然没有获得所有数据。我该怎么办?
我正在使用使用两个查询order by id_student
,limit 35000
,然后将asc
(query1),desc
(query2)用作查询的想法,但是如果数据增加(比如说200000行)。
data= []
sql = ( "SELECT id_student, class,name," +
" ROW_NUMBER() OVER (PARTITION BY class ORDER BY class ASC) row_num," +
"FROM" +
" [project_name.dataset.student_master]" +
"WHERE not class = " + element['class']
)
query = client.run_sync_query(sql)
query.timeout_ms = 20000
query.run()
for row in query.rows:
data.append(row)
return data
答案 0 :(得分:1)
通常,对于大型导出,您应该运行导出作业,该作业会将您的数据放入GCS中的文件中。
但是在这种情况下,您可能只需要浏览更多页面的结果:
如果查询返回的行不适合初始响应,则我们需要通过fetch_data()获取其余行:
query = client.run_sync_query(LIMITED)
query.timeout_ms = TIMEOUT_MS
query.max_results = PAGE_SIZE
query.run() # API request
assert query.complete
assert query.page_token is not None
assert len(query.rows) == PAGE_SIZE
assert [field.name for field in query.schema] == ['name']
iterator = query.fetch_data() # API request(s) during iteration
for row in iterator:
do_something_with(row)
答案 1 :(得分:1)
我能够通过查询公共数据集来收集200,000多行数据,并通过使用计数器变量进行了验证:
query_job = client.query("""
SELECT ROW_NUMBER() OVER (PARTITION BY token_address ORDER BY token_address ASC) as row_number,token_address
FROM `bigquery-public-data.ethereum_blockchain.token_transfers`
WHERE token_address = '0x001575786dfa7b9d9d1324ec308785738f80a951'
ORDER BY 1
""")
contador = 0
for row in query_job:
contador += 1
print(contador,row)