我需要将地理区域表连接到Hive
中的用户表。
地理区域可以是国家,州或城市级别。
当地理区域为县一级时,我需要选择该县中的所有列表。我的蜂巢版本在加入条件下不允许OR。
编写此查询的最有效方法是什么?
例如,
区域表
region_id , city, state, country
1, Rome, NULL , IT
2, NULL, NULL, BM
3, VANCOUVER, BC, CA
用户表
user_id, city , state, country
103 , VANCOUVER , BC , CA
105 , HAMILTON, NULL, BM
106 , NULL, NULL, BM
结果表
region_id, user_id, city, state, country
3, 103 , VANCOUVER , BC , CA
2, 105 , HAMILTON, NULL, BM
2, 106 , NULL, NULL, BM
答案 0 :(得分:1)
嗯,它可能没有您想要的效率高,但这应该可以工作:
SELECT DISTINCT
coalesce(cty.region_id, sta.region_id, cou.region_id) as region_id, u.*
FROM users u
LEFT JOIN regions cty ON u.city = cty.city
LEFT JOIN regions sta ON u.state = sta.state
LEFT JOIN regions cou ON u.ccountyity = cou.county
,另一种选择是:
SELECT
r.region_id
, u.*
FROM users u
INNER JOIN (
SELECT
regions.region_id, users.user_id
FROM users
INNER JOIN regions ON users.city = regions.city
UNION
SELECT
regions.region_id, users.user_id
FROM users
INNER JOIN regions ON usesr.state = regions.state
UNION
SELECT
regions.region_id, users.user_id
FROM users
INNER JOIN regions ON users.ccounty = regions.county
) r ON u.users_id = r.users_id