从Big Query滚动30天的数据

时间:2018-09-06 08:29:00

标签: google-bigquery

假设我有以下查询:

SELECT ga_channelGrouping, ga_sourceMedium,ga_campaign, SUM(ga_sessions) as sessions,
SUM(ga_sessionDuration)/SUM(ga_sessions) as avg_sessionDuration, 
SUM(ga_users)as Users, SUM(ga_newUsers)as New_Users, SUM(ga_bounces)/SUM(ga_sessions) 
AS ga_bounceRate, SUM(ga_pageviews)/SUM(ga_sessions)as pageViews_per_sessions, 
SUM( ga_transactions)/SUM(ga_sessions) AS ga_conversionRate 


FROM db.table 

group by ga_channelGrouping, ga_sourceMedium,ga_campaign

我如何找到Big Query中连续30天的数据。我的DATE列值具有以下格式:2018-06-19 11:00:00 UTC

2 个答案:

答案 0 :(得分:2)

您可以使用DATE_ADDDATE_SUB函数移动日期值,并使用TIMESTAMP_ADDTIMESTAMP_SUB移动时间戳值。

因此您可以尝试:

SELECT ga_channelGrouping, ga_sourceMedium,ga_campaign, SUM(ga_sessions) as sessions,
SUM(ga_sessionDuration)/SUM(ga_sessions) as avg_sessionDuration, 
SUM(ga_users)as Users, SUM(ga_newUsers)as New_Users, SUM(ga_bounces)/SUM(ga_sessions) 
AS ga_bounceRate, SUM(ga_pageviews)/SUM(ga_sessions)as pageViews_per_sessions, 
SUM( ga_transactions)/SUM(ga_sessions) AS ga_conversionRate 


FROM db.table 

WHERE your_date_column >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 24*30 HOUR)

group by ga_channelGrouping, ga_sourceMedium,ga_campaign

TIMESTAMP_SUB不需要间隔DAY,因此,我们已经完成了24*30个小时才能返回30天。


编辑:如果您希望在一天中的任何时间回滚30天,则可以执行以下操作:

WHERE your_date_column >= TIMESTAMP_TRUNC(TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 24*30 HOUR), DAY)

OR

WHERE CAST(your_date_column AS DATE) >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY))

答案 1 :(得分:1)

  

我如何找到Big Query中连续30天的数据。我的DATE列值是这样的格式:2018-06-19 11:00:00 UTC

首先,我想指出的是aggregating last 30 daysrolling 30 days有很大的不同-因此,下面的答案实际上是针对rolling 30 days与{{1 }}

以下内容适用于BigQuery Standard SQL,并假定您的日期列名为just last 30 days,并且属于TIMESTAMP数据类型

your_date_column

让您了解它的工作原理-尝试使用下面的虚拟示例(为简单起见,它会滚动#standardSQL SELECT your_date_column, -- data type of TIMESTAMP with value like 2018-06-19 11:00:00 UTC ga_channelGrouping, ga_sourceMedium, ga_campaign, SUM(ga_sessions) OVER(win) AS sessions, (SUM(ga_sessionDuration) OVER(win))/(SUM(ga_sessions) OVER(win)) AS avg_sessionDuration, SUM(ga_users) OVER(win) AS Users, SUM(ga_newUsers) OVER(win) AS New_Users, (SUM(ga_bounces) OVER(win))/(SUM(ga_sessions) OVER(win)) AS ga_bounceRate, (SUM(ga_pageviews) OVER(win))/(SUM(ga_sessions) OVER(win)) AS pageViews_per_sessions, (SUM(ga_transactions) OVER(win))/(SUM(ga_sessions) OVER(win)) AS ga_conversionRate FROM `project.dataset.table` WINDOW win AS ( PARTITION BY ga_channelGrouping, ga_sourceMedium, ga_campaign ORDER BY UNIX_DATE(DATE(your_date_column)) RANGE BETWEEN 29 PRECEDING AND CURRENT ROW ) 天)

3

结果为

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 1 value, TIMESTAMP '2018-06-19 11:00:00 UTC' your_date_column UNION ALL
  SELECT 2, '2018-06-20 11:00:00 UTC' UNION ALL
  SELECT 3, '2018-06-21 11:00:00 UTC' UNION ALL
  SELECT 4, '2018-06-22 11:00:00 UTC' UNION ALL
  SELECT 5, '2018-06-23 11:00:00 UTC' UNION ALL
  SELECT 6, '2018-06-24 11:00:00 UTC' UNION ALL
  SELECT 7, '2018-06-25 11:00:00 UTC' UNION ALL
  SELECT 8, '2018-06-26 11:00:00 UTC' UNION ALL
  SELECT 9, '2018-06-27 11:00:00 UTC' UNION ALL
  SELECT 10, '2018-06-28 11:00:00 UTC' 
)
SELECT 
  your_date_column, 
  value, 
  SUM(value) OVER(win) rolling_value
FROM `project.dataset.table`
WINDOW win AS (ORDER BY UNIX_DATE(DATE(your_date_column)) RANGE BETWEEN 2 PRECEDING AND CURRENT ROW)
ORDER BY your_date_column