在bigquery中编写String比较函数

时间:2016-08-01 09:33:42

标签: google-bigquery

我正在尝试使用bigquery UDF编写一个函数来比较字符串列表和其他字符串列表。 基本上我想知道我们每周有多少新用户和这些新用户中有多少用户在未来几周内访问我们的网站。为此我创建了一个查询,它给我一个每周所有电子邮件的字符串(使用group_concat)并将其保存为表格。现在需要知道如何将每个电子邮件与其他电子邮件集合进行比较。 最后,我希望有一个这样的表:

+----------------+-------+-------+--------+------+
|       | week 1 | week 2 | week 3| week 4 | ... |
+----------------+-------+-------+--------+------+
| week1 |   17   |    7   |   5   |   9    | ... |
+----------------+-------+-------+--------+------+
| week2 |        |   19   |  13   |   8    | ... |  
+-----------------+-------+-------+--------+-----+
| week3 |        |        |  24   |   15   | ... |
+-----------------+-------+-------+--------+-----+

1 个答案:

答案 0 :(得分:2)

只是想给你一个玩

的想法
SELECT 
  CONCAT('week', STRING(prev)) AS WEEK,
  SUM(IF(next=19, authors, 0)) AS week19,
  SUM(IF(next=20, authors, 0)) AS week20,
  SUM(IF(next=21, authors, 0)) AS week21,
  SUM(IF(next=22, authors, 0)) AS week22,
  SUM(IF(next=23, authors, 0)) AS week23
FROM (
  SELECT prev, next, COUNT(author) AS authors
  FROM (
    SELECT
      prev_week.week_created AS prev,
      next_week.week_created AS next,
      prev_week.author AS author
    FROM (
      SELECT  
        WEEK(SEC_TO_TIMESTAMP(created_utc)) AS week_created,
        author
      FROM [fh-bigquery:reddit_posts.2016_05] 
      GROUP BY 1,2
    ) next_week
    LEFT JOIN (
      SELECT  
        WEEK(SEC_TO_TIMESTAMP(created_utc)) AS week_created,
        author
      FROM [fh-bigquery:reddit_posts.2016_05] 
      GROUP BY 1,2
    ) AS prev_week
    ON prev_week.author = next_week.author
    HAVING prev <= next
  )
  GROUP BY 1,2
)
GROUP BY 1
ORDER BY 1

结果如下 enter image description here

这是你能想到的最接近的

同时,请注意 - 对于报表设计而言,BigQuery不太适合数据处理。所以我认为在BigQuery(外部选择)中创建矩阵/数据透视不是最合适的 - 它可以在您的报告工具中完成。但是计算所有对prev|next|count(内部选择)绝对适合BigQuery