在bigquery中联接具有相同模式的两个表

时间:2019-03-07 11:40:50

标签: google-bigquery

我有两个表具有与输入相同的模式:

+---------+--------+----------------------+
|  value  |  city  |   timestamp          |
+---------+--------+----------------------+
| 50      |  LA    |  2019-02-6 03:05pm   |
| 163     |  NYC   |  2019-02-5 03:06pm   |
| 681     |  SF    |  2019-02-4 06:41pm   |
| 35      |  LA    |  2019-02-3 05:12pm   |
+---------+--------+----------------------+

第一个表包含常规费用,第二个表包含费用。我想加入表并将它们分组如下:

+------------+--------+----------+--------------+
|  regular   |  fees  |   city   |  timestamp   |
+------------+--------+----------+--------------+
| 50         | 20     | LA       |  2019-02-6   |
| 163        | NULL   | NYC      |  2019-02-5   |
| 681        | ..     | SF       |  2019-02-4   |
| 35         | ..     | LA       |  2019-02-3   |
+------------+--------+----------+--------------+

可能有几天没有收费。我尝试过的:

SELECT t1.city, regular, fees, t1.day
FROM
(
  SELECT city, SUM(value) AS regular, FORMAT_TIMESTAMP("%Y-%m-%d", TIMESTAMP(timestamp)) as day
  FROM `payments`
  GROUP BY day, city
) t1
FULL JOIN (
  SELECT city, SUM(value) AS fees, FORMAT_TIMESTAMP("%Y-%m-%d", TIMESTAMP(timestamp)) as day
  FROM `fees`
  GROUP BY day, city
) t2
ON t1.day = t2.day
ORDER BY t1.day DESC

这会产生正确的输出模式,但是不能正确计算费用:

+------------+--------+----------+--------------+
|  regular   |  fees  |   city   |  timestamp   |
+------------+--------+----------+--------------+
| 26500      | 6300   | LA       |  2019-02-6   |
| 26500      | 8500   | LA       |  2019-02-6   |
| 26500      | 1000   | LA       |  2019-02-6   |
+------------+--------+----------+--------------+

如您所见,我当天获得的费用和城市所收取的费用是不同的。有什么想法我在这里做错了吗?

1 个答案:

答案 0 :(得分:2)

问题只存在于您的ON子句中-您仅在几天之内加入,但应在天和城市中加入,如下面的代码段

ON t1.day = t2.day
AND t1.city = t2.city