按标准SQL

时间:2018-05-09 10:17:51

标签: sql google-bigquery

我正在建立一个用户与网站互动的日志,到目前为止,我每次访问都有一行显示推荐频道和时间戳:

enter image description here

我想将每个visit_ref按日期排名,以便最新排名最高,最远排名最低排名,在我查询的日期范围内。

到目前为止,这是我的代码,删除了频道以便于阅读:

SELECT TIMESTAMP_SECONDS(visitStartTime) AS stamp, 
customDimension.value AS UserID,
CONCAT(CAST(fullVisitorId AS STRING),CAST(visitId AS STRING)) AS visit_ref,
COUNT(DISTINCT CONCAT(CAST(fullVisitorId AS STRING),CAST(visitId AS STRING))) OVER (PARTITION BY customDimension.value) AS total_visits_in_cycle,
RANK() OVER (PARTITION BY CONCAT(CAST(fullVisitorId AS STRING),CAST(visitId AS STRING)), TIMESTAMP_SECONDS(visitStartTime) ORDER BY TIMESTAMP_SECONDS(visitStartTime)) AS visitrank,
  COUNT(DISTINCT transaction.transactionid) AS orders

FROM `xxx.xxx.ga_sessions_20*` AS t
  CROSS JOIN UNNEST(hits) AS hits
  CROSS JOIN UNNEST(t.customdimensions) AS customDimension
WHERE parse_date('%y%m%d', _table_suffix) between 
DATE_sub(current_date(), interval 3 day) and
DATE_sub(current_date(), interval 1 day)
AND customDimension.index = 2
GROUP BY 1,2,3, fullVisitorId, visitid, visitStartTime
ORDER BY UserID
LIMIT 500

在此示例中,如屏幕截图所示,排名始终为1,如何按时间戳获得唯一visit_ref的排名?

我想要的输出在下面,visitrank针对最早的访问显示1,针对此用户显示针对最新访问的3

2   2018-05-07 08:02:30.000 UTC 00008736-01f0-4e0e-8e3b-4dc398e5b6f8    74664051693279955771525680150   3   2   Email - CRM Campaigns   0    
3   2018-05-06 21:59:20.000 UTC 00008736-01f0-4e0e-8e3b-4dc398e5b6f8    74664051693279955771525643960   3   1   Email - CRM Campaigns   0    
4   2018-05-07 05:39:15.000 UTC 00008736-01f0-4e0e-8e3b-4dc398e5b6f8    74664051693279955771525671555   3   3   Email - CRM Campaigns   0    

RANK() OVER (PARTITION BY CONCAT(CAST(fullVisitorId AS STRING),CAST(visitId AS STRING)), TIMESTAMP_SECONDS(visitStartTime) ORDER BY TIMESTAMP_SECONDS(visitStartTime)) AS visitrank,

我正在使用Google BigQuery StandardSQL。

1 个答案:

答案 0 :(得分:1)

分区窗口定义了应该考虑的记录子集。通过包含TIMESTAMP_SECONDS(visitStartTime),您将分区设置为始终为1的记录(尽管实际数据中可能会有更多),并且您只看到等级为1。

另外我不清楚为什么你需要在分区定义中进行concat / cast,尽管在这个转换过程中可能会发生一些重大的转换。我会用这个:

rank() over (partition by fullVisitorId order by timestamp_seconds(visitStartTime) desc)