BigQuery - 使用Lag功能时的自定义滞后偏移

时间:2015-10-27 21:29:19

标签: google-bigquery partitioning lag

我有一个BigQuery表,如下所示:

date  hits_eventInfo_Category  hits_eventInfo_Action  session_id  user_id  hits_time  hits_eventInfo_Label

20151021  Air  Search  1445001  A232  1952  City1
20151021  Air  Select  1445001  A232  2300  Vendor1
20151021  Air  Search  1445001  A111  1000  City2
20151021  Air  Search  1445001  A111  1900  City3
20151021  Air  Select  1445001  A111  7380  Vendor2
20151021  Air  Search  1445001  A580  1000  City4
20151021  Air  Search  1445001  A580  1900  City5
20151021  Air  Search  1445001  A580  1900  City6
20151021  Air  Select  1445001  A580  7380  Vendor3

该表显示了3个用户的用户活动 - A232,A111和A580,以便:

i) A232 - Made 1 Search at 'City1' and chose 'Vendor1' from 'City1'
ii) A111 - Made the 1st search at 'City2' and did not choose any vendor from there. Made a 2nd search at 'City3' and then ultimately chose a 'Vendor2' from here.
iii) A580 - 1st search at 'City4', no vendor chosen. 2nd search at 'City5', no vendor chosen. 3rd search at 'City6', 'Vendor3' chosen from City6.

我感兴趣的是只检索用户实际选择供应商的城市,也就是说,对用户之前未选择供应商的搜索不感兴趣。

必需的输出表:

date  hits_eventInfo_Category  hits_eventInfo_Action  session_id  user_id  hits_time  city  vendor

20151021  Air  Search  1445001  A232  1952  City1  Vendor1
20151021  Air  Search  1445001  A111  1900  City3  Vendor2
20151021  Air  Search  1445001  A580  1900  City6  Vendor3

在user_id上进行分区并按hits_time排序后,我一直尝试使用LAG函数在hits_eventInfo_eventLabel字段上执行此操作,即LAG(hits_eventInfo_eventLabel,1) OVER( PARTITION BY user_id ORDER BY hits_time)

然而,由于我使用滞后偏移量为1,上面的表达式帮助我只获得用户A232的所需输出(因为他只进行了1次搜索,这意味着在选择供应商之前的前一条记录肯定是搜索记录)。

有没有办法可以让这个滞后表达式更具动态性,以便在进行选择之前只检索搜索到的直接位置 - 无论在选择之前进行了多少次搜索?

OR

我可以采取其他功能/途径来实现这一目标吗?

1 个答案:

答案 0 :(得分:1)

select 
  date, 
  hits_eventInfo_Category, 
  hits_eventInfo_Action, 
  session_id, 
  user_id, 
  hits_time, 
  prev as city, 
  hits_eventInfo_Label as vendor
from (
  select *, 
    lag(hits_eventInfo_Label, 1) over(partition by user_id order by hits_time) as prev
  from dataset.table
)
where hits_eventInfo_Action = 'Select'