BigQuery:进行交叉连接并添加要分区的其他字段时,会产生额外的行

时间:2018-10-22 12:51:52

标签: sql google-bigquery

问题与我以前的Fill in missing values for joined tables in BigQuery有关。

基本上我有2张桌子。一种与股票交易,另一种与股票价格。目标是要有一个表,其中每天针对有价格的地方计算价值。这在先前的问题中得到了回答。 (执行CROSS JOIN和ARRAY_AGG来填写存在股价但没有任何交易的日期的漏损金额)。

现在,如果我想添加其他值(例如“运行金额/余额”),那么结果将被破坏,因为我逐行添加其他字段(running_amount),所以基本上每一行都被加倍(日期和股票代码)去做)。在这里,我对SQL的理解就结束了:),所以​​我将不胜感激。目标是每个日期和股票代码只有一行。

这是完整的示例查询:

WITH `trans` AS (
SELECT DATE '2018-10-02' trans_date, 10.0 stock_amount, 'TX' stock_symbol UNION ALL
SELECT DATE '2018-10-03', 5.0, 'TX' UNION ALL
SELECT DATE '2018-10-05', 11.0, 'AX' UNION ALL
SELECT DATE '2018-10-10', 10.0, 'AX' 

),
`prices` AS (
 SELECT DATE '2018-10-01' price_date, 1.0 price, 'TX' symbol UNION ALL
  SELECT DATE '2018-10-02', 2.0, 'TX' UNION ALL
  SELECT DATE '2018-10-03', 3.0, 'TX' UNION ALL
  SELECT DATE '2018-10-04', 4.0, 'TX' UNION ALL
  SELECT DATE '2018-10-05', 5.0, 'TX' UNION ALL
  SELECT DATE '2018-10-06', 6.0, 'TX' UNION ALL
  SELECT DATE '2018-10-07', 7.0, 'TX' UNION ALL
  SELECT DATE '2018-10-08', 8.0, 'TX' UNION ALL
  SELECT DATE '2018-10-08', 8.0, 'AX' UNION ALL
  SELECT DATE '2018-10-09', 9.0, 'TX' UNION ALL
  SELECT DATE '2018-10-09', 9.0, 'AX' UNION ALL
  SELECT DATE '2018-10-10', 10.0, 'TX' UNION ALL
  SELECT DATE '2018-10-10', 10.0, 'AX' UNION ALL
  SELECT DATE '2018-10-11', 11.0, 'TX' UNION ALL
  SELECT DATE '2018-10-11', 11.0, 'AX' UNION ALL
  SELECT DATE '2018-10-12', 11.0, 'AX' UNION ALL
  SELECT DATE '2018-10-12', 12.0, 'TX' 
)

SELECT
  price_date, 
  tx.stock_symbol AS token_symbol,
  IFNULL(
    ARRAY_AGG(
      IF(p.price_date >= tx.trans_date AND p.symbol = tx.stock_symbol, stock_amount, NULL) 
      IGNORE NULLS ORDER BY trans_date DESC LIMIT 1
      )[OFFSET(0)],
  -1234567890) stock_amount,
  running_amount,    
  price

FROM (
   SELECT
       trans_date,
       stock_symbol,
       stock_amount,
       SUM(stock_amount) OVER (PARTITION BY stock_symbol ORDER BY trans_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_amount
   FROM `trans`
   ORDER BY stock_symbol, trans_date
)
AS tx
CROSS JOIN `prices` as p
GROUP BY price_date, price, token_symbol
,running_amount
HAVING stock_amount != -1234567890
ORDER BY stock_symbol, price_date

预期结果:

Row price_date  token_symbol    stock_amount    running_amount  price
1   2018-10-08  AX  11  11  8
2   2018-10-09  AX  11  11  9
3   2018-10-10  AX  10  21  10
4   2018-10-11  AX  10  21  11
5   2018-10-12  AX  10  21  11
6   2018-10-02  TX  10  10  2
7   2018-10-03  TX  5   15  3
8   2018-10-04  TX  5   15  4
9   2018-10-05  TX  5   15  5
10  2018-10-06  TX  5   15  6
11  2018-10-07  TX  5   15  7
12  2018-10-08  TX  5   15  8
13  2018-10-09  TX  5   15  9
14  2018-10-10  TX  5   15  10
15  2018-10-11  TX  5   15  11
16  2018-10-12  TX  5   15  12

1 个答案:

答案 0 :(得分:1)

对于BigQuery标准SQL

#standardSQL
WITH `trans` AS (
  SELECT DATE '2018-10-02' trans_date, 10.0 stock_amount, 'TX' stock_symbol UNION ALL
  SELECT DATE '2018-10-03', 5.0, 'TX' UNION ALL
  SELECT DATE '2018-10-05', 11.0, 'AX' UNION ALL
  SELECT DATE '2018-10-10', 10.0, 'AX' 

), `prices` AS (
  SELECT DATE '2018-10-01' price_date, 1.0 price, 'TX' symbol UNION ALL
  SELECT DATE '2018-10-02', 2.0, 'TX' UNION ALL
  SELECT DATE '2018-10-03', 3.0, 'TX' UNION ALL
  SELECT DATE '2018-10-04', 4.0, 'TX' UNION ALL
  SELECT DATE '2018-10-05', 5.0, 'TX' UNION ALL
  SELECT DATE '2018-10-06', 6.0, 'TX' UNION ALL
  SELECT DATE '2018-10-07', 7.0, 'TX' UNION ALL
  SELECT DATE '2018-10-08', 8.0, 'TX' UNION ALL
  SELECT DATE '2018-10-08', 8.0, 'AX' UNION ALL
  SELECT DATE '2018-10-09', 9.0, 'TX' UNION ALL
  SELECT DATE '2018-10-09', 9.0, 'AX' UNION ALL
  SELECT DATE '2018-10-10', 10.0, 'TX' UNION ALL
  SELECT DATE '2018-10-10', 10.0, 'AX' UNION ALL
  SELECT DATE '2018-10-11', 11.0, 'TX' UNION ALL
  SELECT DATE '2018-10-11', 11.0, 'AX' UNION ALL
  SELECT DATE '2018-10-12', 11.0, 'AX' UNION ALL
  SELECT DATE '2018-10-12', 12.0, 'TX' 
)
SELECT
  price_date, 
  tx.stock_symbol AS token_symbol,
  IFNULL(
    ARRAY_AGG(
      IF(p.price_date >= tx.trans_date AND p.symbol = tx.stock_symbol, stock_amount, NULL) 
      IGNORE NULLS ORDER BY trans_date DESC LIMIT 1
      )[OFFSET(0)],
  -1234567890) stock_amount,
  SUM(
    IF(p.price_date >= tx.trans_date AND p.symbol = tx.stock_symbol, stock_amount, 0) 
  ) running_amount,
  price
FROM `trans` AS tx
CROSS JOIN `prices` AS p
WHERE stock_symbol = symbol
GROUP BY price_date, price, token_symbol
HAVING stock_amount != -1234567890
-- ORDER BY stock_symbol, price_date   

有结果

Row price_date  token_symbol    stock_amount    running_amount  price    
1   2018-10-08  AX              11.0            11.0            8.0  
2   2018-10-09  AX              11.0            11.0            9.0  
3   2018-10-10  AX              10.0            21.0            10.0     
4   2018-10-11  AX              10.0            21.0            11.0     
5   2018-10-12  AX              10.0            21.0            11.0     
6   2018-10-02  TX              10.0            10.0            2.0  
7   2018-10-03  TX              5.0             15.0            3.0  
8   2018-10-04  TX              5.0             15.0            4.0  
9   2018-10-05  TX              5.0             15.0            5.0  
10  2018-10-06  TX              5.0             15.0            6.0  
11  2018-10-07  TX              5.0             15.0            7.0  
12  2018-10-08  TX              5.0             15.0            8.0  
13  2018-10-09  TX              5.0             15.0            9.0  
14  2018-10-10  TX              5.0             15.0            10.0     
15  2018-10-11  TX              5.0             15.0            11.0     
16  2018-10-12  TX              5.0             15.0            12.0