列向量和派生位向量的加权和

时间:2019-06-24 18:18:56

标签: google-bigquery standard-sql

我们有一个报价表和两个买家的规模。大小为 s 的竞标价格 p 表示买方愿意以价格 p 购买数量为 s 的产品。我们有一个包含四列的表格:

  • 两个买家 pA pB 的出价。
  • 出价大小, sA sB

我们的工作是在表中添加一个新的最佳尺寸列( bS ),以最佳价格返回尺寸。如果两个买家的价格相同,则 bS 等于 sA + sB ,否则,我们需要采用提供更高价格的买家的出价大小。 / p>

下面是带有所需输出的示例表。 enter image description here

该问题的简单解决方案:

SELECT pA, pB, sA, sB,
  CASE
    WHEN pA = pB THEN sA + sB
    WHEN pA > pB THEN sA
    ELSE sB
  END AS bS
FROM t

现在让我们将问题推广到四个买家。一个标准的SQL解决方案是

WITH t_ext AS (
SELECT *, GREATEST(pA, pB, pC, pD) as bestPrice
FROM `t` 
)
SELECT *, (sA * CAST(pA = bestPrice AS INT64) + 
           sB * CAST(pB = bestPrice AS INT64) + 
           sC * CAST(pC = bestPrice AS INT64) +
           sD * CAST(pD = bestPrice AS INT64)) 
AS bS FROM t_ext

问题1)

是否存在简化的查询

  • 使用函数SUM代替手动添加四个项目
  • 避免重复投射?

问题2)

Google BigQuery生态系统中是否有一种方法可以将该查询重用到另一个具有列名称的表,例如 priceA priceB 而不是 pA pB

顺便说一句。我针对这个问题写了blog post,重点介绍了Python和Q中的解决方案,我想知道标准sql中最好的解决方案的样子。

1 个答案:

答案 0 :(得分:1)

以下是BigQuery标准SQL的通用名称,它不依赖于购买者数量以及价格和尺寸字段的命名。唯一的期望是所有价格都先上涨,然后是您示例中的所有各个尺寸。另外,我假设所有数字都是整数(例如上述示例),但这可以进行调整以处理FLOAT

#standardSQL
WITH t_ext AS (
  SELECT * EXCEPT(arr), 
    ARRAY(SELECT CAST(val AS INT64) FROM UNNEST(arr) val WITH OFFSET WHERE OFFSET < 4) AS prices,
    ARRAY(SELECT CAST(val AS INT64) FROM UNNEST(arr) val WITH OFFSET WHERE OFFSET >= 4) AS sizes,
    (SELECT MAX(CAST(val AS INT64)) FROM UNNEST(arr) val WITH OFFSET WHERE OFFSET < 4) AS bestPrice
  FROM (
    SELECT *, REGEXP_EXTRACT_ALL(TO_JSON_STRING(T), r':(\d+)') AS arr
    FROM `project.dataset.table` t
  )
)
SELECT * EXCEPT(prices, sizes), 
  (SELECT SUM(size)
    FROM UNNEST(prices) price WITH OFFSET
    JOIN UNNEST(sizes) size WITH OFFSET
    USING(OFFSET) 
    WHERE price = bestPrice
  ) AS bS
FROM t_ext  

您唯一需要在上述查询中更改的是购买者数量-在以下表达式中(在以下表达式中-4可以替换为ARRAY_LENGTH(arr) / 2

WHERE OFFSET < 4
WHERE OFFSET >= 4
WHERE OFFSET < 4

例如,对于以下虚拟数据(4个购买者)

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 1 pA, 2 pB, 3 pC, 4 pD, 1 sA, 1 sB, 1 sC, 5 sD UNION ALL
  SELECT 1, 4, 2, 4, 1, 6, 1, 5 UNION ALL
  SELECT 4, 4, 2, 1, 7, 1, 1, 1
), t_ext AS (
  SELECT * EXCEPT(arr), 
    ARRAY(SELECT CAST(val AS INT64) FROM UNNEST(arr) val WITH OFFSET WHERE OFFSET < 4) AS prices,
    ARRAY(SELECT CAST(val AS INT64) FROM UNNEST(arr) val WITH OFFSET WHERE OFFSET >= 4) AS sizes,
    (SELECT MAX(CAST(val AS INT64)) FROM UNNEST(arr) val WITH OFFSET WHERE OFFSET < 4) AS bestPrice
  FROM (
    SELECT *, REGEXP_EXTRACT_ALL(TO_JSON_STRING(T), r':(\d+)') AS arr
    FROM `project.dataset.table` t
  )
)
SELECT * EXCEPT(prices, sizes), 
  (SELECT SUM(size)
    FROM UNNEST(prices) price WITH OFFSET
    JOIN UNNEST(sizes) size WITH OFFSET
    USING(OFFSET) 
    WHERE price = bestPrice
  ) AS bS
FROM t_ext

结果是

Row pA  pB  pC  pD  sA  sB  sC  sD  bestPrice   bS   
1   1   2   3   4   1   1   1   5   4           5    
2   1   4   2   4   1   6   1   5   4           11   
3   4   4   2   1   7   1   1   1   4           8