BigQuery:运行最后一个值和表连接

时间:2016-01-27 16:27:20

标签: google-bigquery

Table_1是我的销售表:

Time | item | ...
-----------------
 1   |  X   | ...
 1   |  Y   | ...
 2   |  X   | ...
 4   |  X   | ...
 6   |  X   | ...
 6   |  Y   | ...

表_2是我的费用表

Time | item | Cost
-----------------
 1   |  X   | a
 1   |  Y   | b
 3   |  X   | c
 4   |  X   | d
 4   |  Y   | e
 5   |  X   | f

我想要实现的目标是:
对于Table_1中的每一行,从表格_2中获取最新的费用值(即最多使用Table_1行' s时间)

结果应如下所示:

Time | item | ... | Cost
------------------------
 1   |  X   | ... | a
 1   |  Y   | ... | b
 2   |  X   | ... | a
 4   |  X   | ... | d
 6   |  X   | ... | f
 6   |  Y   | ... | e

(我知道它在传统的SQL中使用SELECT部分​​中的子查询或不相等的连接直接使用,但BigQuery不允许它)

1 个答案:

答案 0 :(得分:1)

尝试以下:

SELECT sales.time AS [time], sales.item AS item, cost 
FROM (
  SELECT sales.item, sales.time, cost, 
         cost.time - sales.time AS delta,
         ROW_NUMBER() OVER(PARTITION BY sales.item, sales.time ORDER BY delta DESC) AS win
  FROM Table_1 as sales
  LEFT JOIN Table_2 as cost
  ON sales.item = cost.item
  WHERE cost.time - sales.time <= 0
) 
WHERE win = 1
ORDER BY 1, 2

应该给你准确的结果

time    item    cost     
   1       x       a     
   1       y       b     
   2       x       a     
   4       x       d     
   6       x       f     
   6       y       e