我正在尝试使用bigquery UDF编写一个函数来比较字符串列表和其他字符串列表。 基本上我想知道我们每周有多少新用户和这些新用户中有多少用户在未来几周内访问我们的网站。为此我创建了一个查询,它给我一个每周所有电子邮件的字符串(使用group_concat)并将其保存为表格。现在需要知道如何将每个电子邮件与其他电子邮件集合进行比较。 最后,我希望有一个这样的表:
+----------------+-------+-------+--------+------+
| | week 1 | week 2 | week 3| week 4 | ... |
+----------------+-------+-------+--------+------+
| week1 | 17 | 7 | 5 | 9 | ... |
+----------------+-------+-------+--------+------+
| week2 | | 19 | 13 | 8 | ... |
+-----------------+-------+-------+--------+-----+
| week3 | | | 24 | 15 | ... |
+-----------------+-------+-------+--------+-----+
答案 0 :(得分:2)
只是想给你一个玩
的想法SELECT
CONCAT('week', STRING(prev)) AS WEEK,
SUM(IF(next=19, authors, 0)) AS week19,
SUM(IF(next=20, authors, 0)) AS week20,
SUM(IF(next=21, authors, 0)) AS week21,
SUM(IF(next=22, authors, 0)) AS week22,
SUM(IF(next=23, authors, 0)) AS week23
FROM (
SELECT prev, next, COUNT(author) AS authors
FROM (
SELECT
prev_week.week_created AS prev,
next_week.week_created AS next,
prev_week.author AS author
FROM (
SELECT
WEEK(SEC_TO_TIMESTAMP(created_utc)) AS week_created,
author
FROM [fh-bigquery:reddit_posts.2016_05]
GROUP BY 1,2
) next_week
LEFT JOIN (
SELECT
WEEK(SEC_TO_TIMESTAMP(created_utc)) AS week_created,
author
FROM [fh-bigquery:reddit_posts.2016_05]
GROUP BY 1,2
) AS prev_week
ON prev_week.author = next_week.author
HAVING prev <= next
)
GROUP BY 1,2
)
GROUP BY 1
ORDER BY 1
这是你能想到的最接近的
同时,请注意 - 对于报表设计而言,BigQuery不太适合数据处理。所以我认为在BigQuery(外部选择)中创建矩阵/数据透视不是最合适的 - 它可以在您的报告工具中完成。但是计算所有对prev|next|count
(内部选择)绝对适合BigQuery