避免重复计算 - BigQuery

时间:2018-02-15 21:12:20

标签: google-bigquery

我有两张桌子,对于A中的每个区域,我想找到B中最近的区域。

表A:

------------------------
ID | Start | End | Color 
------------------------
 1 |  400  | 500 | White
------------------------
 1 |  10   | 20  | Red 
------------------------
 2 |   2   |  10 | Blue 
------------------------
 4 |   88  |  90 | Color 
------------------------

表B:

-------------------------------
ID | Start | End | Name | Name2 
-------------------------------
 1 |  1    | 2   | XYZ1 | EWQ
-------------------------------
 1 |  50   | 60  | XYZ4 | EWY
-------------------------------
 2 |  150  | 160 | ABC1 | TRE
-------------------------------
 2 |  50   | 60  | ABC2 | YUE
-------------------------------
 4 |  100  | 120 | EFG  | MMN
-------------------------------

以下是结果表:

-------------------------------------------------------
ID | Start | End | Color | Closest Name | Closest Name2
-------------------------------------------------------
 1 |  400  | 500 | White |   XYZ4       |   EWY
-------------------------------------------------------
 1 |  10   | 20  | Red   |   XYZ1       |  EWQ
-------------------------------------------------------
 2 |   2   |  10 | Blue  |   ABC2       |  YUE
-------------------------------------------------------
 4 |   88  |  90 | Color |   EFG        |  MMN
-------------------------------------------------------

以下是目前的解决方案:

#standardSQL
SELECT
  A.ID,
  A.Start,
  A.END,
  ARRAY_AGG(B.name
  ORDER BY
   (CASE WHEN (ABS(A.END-B.Start) >= ABS(A.Start - B.END)) THEN ABS(A.Start-B.END) ELSE ABS(A.END-B.Start) END)  
  LIMIT
    1)[SAFE_OFFSET(0)] name,
  ARRAY_AGG(B.name2
  ORDER BY
   (CASE WHEN (ABS(A.END-B.Start) >= ABS(A.Start - B.END)) THEN ABS(A.Start-B.END) ELSE ABS(A.END-B.Start) END)  
  LIMIT
    1)[SAFE_OFFSET(0)] name2

FROM
     A
JOIN
     B
ON
  A.ID = B.ID

  WHERE  (A.start>B.End) OR (B.Start> A.END)
GROUP BY
  A.ID,
  A.start,
  A.END

在这种情况下,我们只有两个字段(name和name2);如果B有N个字段,那么有什么办法可以避免重复计算吗?

谢谢!

3 个答案:

答案 0 :(得分:2)

下面的

应该给你一个想法

#standardSQL
WITH A AS (
  SELECT 1 a_id, 400 a_start, 500 a_end, 'White' color UNION ALL
  SELECT 1, 10,  20  , 'Red' UNION ALL
  SELECT 2, 2,   10, 'Blue' UNION ALL
  SELECT 4, 88,  90, 'Color'
), B AS (
  SELECT 1 b_id, 1 b_start, 2 b_end, 'XYZ1' name, 'EWQ' name2 UNION ALL
  SELECT 1, 50, 60,  'XYZ4', 'EWY' UNION ALL
  SELECT 2, 150, 160,'ABC1', 'TRE' UNION ALL
  SELECT 2, 50, 60,  'ABC2', 'YUE' UNION ALL
  SELECT 4, 100, 120,'EFG', 'MMN'
)
SELECT 
  a_id, a_start, a_end, color, names.name, names.name2
FROM (
  SELECT a_id, a_start, a_end, color,  
    ARRAY_AGG(STRUCT(name, name2) ORDER BY POW(ABS(a_start - b_start), 2) + POW(ABS(a_end - b_end), 2) LIMIT 1)[SAFE_OFFSET(0)] names
  FROM A JOIN B ON a_id = b_id
  GROUP BY a_id, a_start, a_end, color
)
ORDER BY a_id  

结果为

Row a_id    a_start a_end   color   name    name2    
1   1       400     500     White   XYZ4    EWY  
2   1       10      20      Red     XYZ1    EWQ  
3   2       2       10      Blue    ABC2    YUE  
4   4       88      90      Color   EFG     MMN  

答案 1 :(得分:1)

您应该可以使用ARRAY_AGG代替STRUCT。以下是一些示例表达式:

ARRAY_AGG(
  B
  ORDER BY
   (CASE WHEN (ABS(A.END-B.Start) >= ABS(A.Start - B.END))
      THEN ABS(A.Start-B.END)
      ELSE ABS(A.END-B.Start)
    END)  
  LIMIT
    1)[SAFE_OFFSET(0)].*

这将根据排序返回B的第一个B实例中的所有字段。

ARRAY_AGG(
  (SELECT AS STRUCT B.* EXCEPT(foo, bar))
  ORDER BY
   (CASE WHEN (ABS(A.END-B.Start) >= ABS(A.Start - B.END))
      THEN ABS(A.Start-B.END)
      ELSE ABS(A.END-B.Start)
    END)  
  LIMIT
    1)[SAFE_OFFSET(0)].*

这将返回B中除foobar之外的所有字段(您可以将这些名称替换为您要排除的内容)。

ARRAY_AGG(
  STRUCT(B.name, B.name2, B.foo, B.bar)
  ORDER BY
   (CASE WHEN (ABS(A.END-B.Start) >= ABS(A.Start - B.END))
      THEN ABS(A.Start-B.END)
      ELSE ABS(A.END-B.Start)
    END)  
  LIMIT
    1)[SAFE_OFFSET(0)].*

这只返回B中的命名字段。你可以列出你想要的任何一个。

答案 2 :(得分:1)

制作一个结构:

ARRAY_AGG((B.name, B.name2)
  ORDER BY
   (CASE WHEN (ABS(A.END-B.Start) >= ABS(A.Start - B.END)) THEN ABS(A.Start-B.END) ELSE ABS(A.END-B.Start) END)  
  LIMIT
    1)[SAFE_OFFSET(0)] names,