窗口功能不同。 BigQuery的

时间:2018-01-12 12:00:08

标签: sql google-bigquery

我试图在BigQuery中做这样的事情 COUNT(DISTINCT user_id) OVER (PARTITION BY DATE_TRUNC(date, month), sample, app_id ORDER BY DATE RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as ACTIVE_USERS

换句话说,我有一个包含Date,Userid,Sample和Application ID的表。我需要计算从月初开始到当天结束的每一天的唯一活跃用户的累计数量。

该功能正常工作,没有明显的,但是,这给了我一个用户的总数,而不是我需要的。

尝试使用dense_rank的一些技巧,但它在这里也不起作用。

有没有办法使用窗口函数计算不同用户的数量?

------------- ----------------修订 这是完整的查询,因此您可以更好地了解我需要的内容

    with mtd1 as (select  
'MonthToDate' as TIMELINE
,fd.date DATE
,td.SAMPLE as SAMPLE
,td.APPNAME as APP_ID 
,sum(fd.revenue) as REVENUE 
,td.user_id ACTIVE_USERS 
from DWH.DailyUser fd 
join DWH.Depositors td using (userid)
group by 1,2,3,4,6
),
mtd as (
select TIMELINE
,DATE
,SAMPLE
,APP_ID
,sum(revenue) over (partition by date_trunc(date, month), sample, app_id order by date range BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as REVENUE
,COUNT(distinct active_users) over (partition by date_trunc(date, month), sample, app_id order by date range BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as ACTIVE_USERS 
from mtd1
)
select * from mtd 
where extract(day from date) = extract(day from current_date)
group by 1,2,3,4,5,6 

3 个答案:

答案 0 :(得分:2)

您可以使用ARRAY_AGG,然后计算每个数组中的不同元素。请注意,如果数组最终过大,则查询将耗尽内存。

with mtd1 as (select  
'MonthToDate' as TIMELINE
,fd.date DATE
,td.SAMPLE as SAMPLE
,td.APPNAME as APP_ID 
,sum(fd.revenue) as REVENUE 
,td.user_id ACTIVE_USERS 
from DWH.DailyUser fd 
join DWH.Depositors td using (userid)
group by 1,2,3,4,6
),
mtd1 as (
select TIMELINE
,DATE
,SAMPLE
,APP_ID
,sum(revenue) over (partition by date_trunc(date, month), sample, app_id order by date range BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as REVENUE
,ARRAY_AGG(active_users) over (partition by date_trunc(date, month), sample, app_id order by date range BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as ACTIVE_USERS 
from mtd1
), mtd AS (
  SELECT * EXCEPT(ACTIVE_USERS),
    (SELECT COUNT(DISTINCT u) FROM UNNEST(ACTIVE_USERS) AS u) AS ACTIVE_USERS
   FROM mtd1
)
select * from mtd 
where extract(day from date) = extract(day from current_date)
group by 1,2,3,4,5,6

答案 1 :(得分:1)

  

窗口功能不同。 BigQuery - 有没有办法使用窗口函数计算不同用户的数量?

此特定问题是重复的,已经回答here

  

...这是完整的查询...

至于如何将上述内容应用于您的特定查询 - 请参阅下文(未经过测试且完全基于您的代码

)     
#standardSQL
WITH mtd1 AS (
  SELECT  
    'MonthToDate' AS TIMELINE
    ,fd.date DATE
    ,td.SAMPLE AS SAMPLE
    ,td.APPNAME AS APP_ID 
    ,SUM(fd.revenue) AS REVENUE 
    ,td.user_id ACTIVE_USERS 
  FROM `DWH.DailyUser` fd 
  JOIN `DWH.Depositors` td USING (userid)
  GROUP BY 1,2,3,4,6
), mtd2 AS (
  SELECT 
    TIMELINE
    ,DATE
    ,SAMPLE
    ,APP_ID
    ,SUM(REVENUE) OVER (PARTITION BY DATE_TRUNC(DATE, MONTH), SAMPLE, APP_ID ORDER BY DATE RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS REVENUE
    ,ARRAY_AGG(ACTIVE_USERS) OVER (PARTITION BY DATE_TRUNC(DATE, MONTH), SAMPLE, APP_ID ORDER BY DATE RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS ACTIVE_USERS 
  FROM mtd1
), mtd AS (
  SELECT * REPLACE((SELECT COUNT(DISTINCT u) FROM UNNEST(ACTIVE_USERS) AS u) AS ACTIVE_USERS)
  FROM mtd2
)
SELECT * FROM mtd 
WHERE EXTRACT(day FROM DATE) = EXTRACT(day FROM CURRENT_DATE)
GROUP BY 1,2,3,4,5,6

答案 2 :(得分:0)

实施count(distinct)的一种方法是使用row_number(),然后计算“1”:

select SUM(CASE WHEN seqnum = 1 THEN 1 ELSE 0 END) OVER (PARTITION BY DATE_TRUNC(date, month), sample, app_id ORDER BY date) as Active_Users
FROM (SELECT t.*,
             ROW_NUMBER() OVER (PARTITION BY DATE_TRUNC(date, month), sample, app_id, user_id ORDER BY DATE) as seqnum
      FROM t
     ) t