从日志文件解析字符串时出现问题,情况就是这样:
"skey":"110","scp_id":"OC05","capedge":"3G"
"skey":"140","scp_id":"OC02","capedge":"3G"
"skey":"0","scp_id":"OC01","capedge":"3G"
这是我们表格的预期输出
| skey | scp_id | capedge |
| 110 | OC05 | 3G |
| 140 | OC02 | 3G |
| 0 | OC01 | 3G |
我尝试使用https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF中的parse_url方法,但不幸的是我们的字符串不是url格式,有没有更好的方法呢?或者我必须使用regexp_extract吗?
谢谢你, Galih答案 0 :(得分:0)
您可以使用SPLIT
函数和REGEXP_EXTRACT
select REGEXP_EXTRACT( skey , ':"(\\w+)"', 1) as skey,
REGEXP_EXTRACT( scp_id , ':"(\\w+)"', 1) as scp_id,
REGEXP_EXTRACT( capedge , ':"(\\w+)"', 1) as capedge
from (
select SPLIT(log_record, ',' )[0] as skey,
SPLIT(log_record , ',')[1] as scp_id,
SPLIT( log_record , ',')[2] as capedge
FROM yourtable
) a;
HUE DEMO:用户ID,密码:演示,演示