regexp替换_ with - in hive

时间:2017-03-07 10:09:30

标签: sql regex hive hiveql

我有字符串值,有时有两个下划线,有时一个用于国家缩写,如下所示:

Cusco_DE_campaign_Million
Manzan_ES_CA_order_stra
Tijuan_FR_sitc_Mill

我希望仅当国家/地区缩写计算两次大写字母(因此CA_FR或ES_CA等)时才用连字符替换下划线

所以输出应该是这样的:

Cusco_DE_campaign_Million
Manzan_ES-CA_order_stra
Tijuan_FR_sitc_Mill

我如何使用regex_replace在Hive SQL中编写它?

谢谢!

1 个答案:

答案 0 :(得分:1)

Replace _     preceded by 2 uppercase letters and _ / start of string   
          and followed by 2 uppercase letters and _ / end of string
with t as
(
    select  explode
            (
                array
                (
                    'Cusco_DE_campaign_Million'
                   ,'Manzan_ES_CA_order_stra'
                   ,'Tijuan_FR_sitc_Mill'
                )
            ) as (val)
)
select  regexp_replace (val,'(?<=(^|_)[A-Z]{2})_(?=[A-Z]{2}(_|$))','-')
from    t
;
+---------------------------+
| Cusco_DE_campaign_Million |
+---------------------------+
| Manzan_ES-CA_order_stra   |
+---------------------------+
| Tijuan_FR_sitc_Mill       |
+---------------------------+