在没有OR条件的情况下有效加入蜂巢

时间:2018-11-19 00:36:18

标签: sql hadoop hive hiveql

我需要将地理区域表连接到Hive中的用户表。 地理区域可以是国家,州或城市级别。 当地理区域为县一级时,我需要选择该县中的所有列表。我的蜂巢版本在加入条件下不允许OR。

编写此查询的最有效方法是什么?

例如,

区域表

region_id , city, state, country
1,  Rome, NULL , IT
2, NULL, NULL, BM
3, VANCOUVER, BC, CA

用户表

user_id, city , state, country
103 , VANCOUVER , BC , CA
105 , HAMILTON, NULL, BM
106 , NULL, NULL, BM

结果表

region_id, user_id, city, state, country
3, 103 , VANCOUVER , BC , CA
2, 105 , HAMILTON, NULL, BM
2, 106 , NULL, NULL, BM

1 个答案:

答案 0 :(得分:1)

嗯,它可能没有您想要的效率高,但这应该可以工作:

SELECT DISTINCT
    coalesce(cty.region_id, sta.region_id, cou.region_id) as region_id, u.*
FROM users u
LEFT JOIN regions cty ON u.city = cty.city
LEFT JOIN regions sta ON u.state = sta.state
LEFT JOIN regions cou ON u.ccountyity = cou.county

,另一种选择是:

SELECT
    r.region_id
  , u.*
FROM users u
INNER JOIN (
        SELECT
            regions.region_id, users.user_id
        FROM users
        INNER JOIN regions ON users.city = regions.city
        UNION
        SELECT
            regions.region_id, users.user_id
        FROM users
        INNER JOIN regions ON usesr.state = regions.state
        UNION
        SELECT
            regions.region_id, users.user_id
        FROM users
        INNER JOIN regions ON users.ccounty = regions.county
    ) r ON u.users_id = r.users_id