Question

我在BigQuery中有一些名称如counts_20171220的表格，其中包含此格式的行（每个日期每contentId只有一行）：

| contentId | views |
+-----------+-------+
| cb32edc0  | 728324|
| 52cbb1ff  | 643220|
...

我希望在给定的时间范围内生成给定contentId的视图计数列表，没有间隙，例如：

|       date |  views |
+------------+--------+
| 2017-12-01 |   NULL | -- or 0
| 2017-12-02 |   NULL | -- or 0
| 2017-12-03 | 728314 |
| 2017-12-04 | 328774 |
| 2017-12-05 |  28242 |
...
| 2017-12-20 |   NULL | -- or 0

为了做到这一点，我想我需要使用*和_table_suffix，但我无法弄清楚如何包含没有条目的日期{ {1}}。我最接近的是这个查询：

contentId

此查询的问题是

它不包含所有日期的行，仅适用于表格中包含#standardSQL SELECT _table_suffix AS date, ARRAY_AGG(views) AS views FROM `test.counts_*` WHERE _table_suffix BETWEEN '20171201' AND '20171220' AND contentId = 'cb32edc0' GROUP BY _table_suffix, contentId ORDER BY date'cb32edc0'
由于它的结构方式，我需要使用（无用的）聚合函数来提取contentId

我应该如何构建这样的查询？我很乐意为这个查询提供特定的帮助，以及关于如何通过日期分区表查询这些内容的一般指示。

Answer 1

这应该有效，假设每个日期至少有一行：

contentId

它不使用显式过滤与所需contentId不匹配的行，而是使用具有聚合函数的条件将它们从结果中排除。如果组中没有包含所需IFNULL的行，则0会确保表达式返回NULL而不是from urllib.request import urlretrieve import urllib.parse from urllib.parse import urlencode, urlparse, parse_qs import webbrowser from bs4 import BeautifulSoup import requests address = 'https://google.com/#q=' # Default Google search address start file = open( "OCR.txt", "rt" ) # Open text document that contains the question word = file.read() file.close() myList = [item for item in word.split('\n')] newString = ' '.join(myList) # The question is on multiple lines so this joins them together with proper spacing print(newString) qstr = urllib.parse.quote_plus(newString) # Encode the string newWord = address + qstr # Combine the base and the encoded query print(newWord) source = requests.get(newWord) soup = BeautifulSoup(source.text, 'lxml')。

收集日期分区表

1 个答案: