我有字符串值,有时有两个下划线,有时一个用于国家缩写,如下所示:
Cusco_DE_campaign_Million
Manzan_ES_CA_order_stra
Tijuan_FR_sitc_Mill
等
我希望仅当国家/地区缩写计算两次大写字母(因此CA_FR或ES_CA等)时才用连字符替换下划线
所以输出应该是这样的:
Cusco_DE_campaign_Million
Manzan_ES-CA_order_stra
Tijuan_FR_sitc_Mill
我如何使用regex_replace在Hive SQL中编写它?
谢谢!
答案 0 :(得分:1)
Replace _ preceded by 2 uppercase letters and _ / start of string
and followed by 2 uppercase letters and _ / end of string
with t as
(
select explode
(
array
(
'Cusco_DE_campaign_Million'
,'Manzan_ES_CA_order_stra'
,'Tijuan_FR_sitc_Mill'
)
) as (val)
)
select regexp_replace (val,'(?<=(^|_)[A-Z]{2})_(?=[A-Z]{2}(_|$))','-')
from t
;
+---------------------------+
| Cusco_DE_campaign_Million |
+---------------------------+
| Manzan_ES-CA_order_stra |
+---------------------------+
| Tijuan_FR_sitc_Mill |
+---------------------------+