Question

目前，感谢来自question的回答者的帮助，我能够成功查询一个单词，并获得最受欢迎的后续单词列表。例如，使用单词＆＃34; great＆＃34;，我能够以下列格式获得10个单词的列表：

SELECT second, SUM(cell.page_count) total 
FROM [publicdata:samples.trigrams] 
WHERE first = "great"
group by 1
order by 2 desc
limit 10

输出：

second     total     
------------------
deal       3048832   
and        1689911   
,          1576341   
a          1019511   
number     984993    
many       875974    
importance 805215    
part       739409    
.          700694    
as         628978

我目前无法弄清楚如何自动执行多个单词的查询（而不是每次都在单独的单词上调用查询），以便我可能有一个输出如：

"great"     total     "new_word_1"           new_total_1 ... "new_word_N"     new_total_N
-----------------------------------------------------------------------------------------
deal       3048832    "new_follow_on_word1"  123456      ... "follow_on_N1"   234567
and        1689911    "new_follow_on_word2"  12345       ... "follow_on_N2"   123456

基本上我可以在单个查询中调用N个单词数量（例如，new_word_1是一个完全不同的单词，如＆＃34;棒球＆＃34;，否< / strong>与＆＃34;伟大＆＃34;）的关系，并获得与不同列上的每个单词相关的总计数。

此外，在了解了BigQuery的pricing之后，我也很难弄清楚如何限制查询的总数据。我可以想到只使用最新数据（比如2010年以后）和每个单词2个字母数字输出，但可能会缺少更明显的限制器

非常感谢任何帮助 - 谢谢！

Answer 1

您可以在同一查询中放置多个第一个单词，但需要分别计算前10个单词，然后将结果连接在一起。这是＆＃34;伟大＆＃34;的一个例子。和＆＃34;棒球＆＃34;

SELECT word1, total1, word2, total2 FROM
(SELECT ROW_NUMBER() OVER() rowid1, word1, total1 FROM (
SELECT second as word1, SUM(cell.page_count) total1 
FROM [publicdata:samples.trigrams] 
WHERE first = "great"
group by 1
order by 2 desc
limit 10)) a1
JOIN
(SELECT ROW_NUMBER() OVER() rowid2, word2, total2 FROM (
SELECT second as word2, SUM(cell.page_count) total2 
FROM [publicdata:samples.trigrams] 
WHERE first = "baseball"
group by 1
order by 2 desc
limit 10)) a2
ON a1.rowid1 = a2.rowid2

使用多个输入在三元组上构造BigQuery

1 个答案: