hive从日志中解析字符串

时间:2018-02-14 06:58:55

标签: regex hive extract

从日志文件解析字符串时出现问题,情况就是这样:

"skey":"110","scp_id":"OC05","capedge":"3G"
"skey":"140","scp_id":"OC02","capedge":"3G"
"skey":"0","scp_id":"OC01","capedge":"3G"

这是我们表格的预期输出

|   skey    |   scp_id  |   capedge |
|   110     |   OC05    |   3G      |
|   140     |   OC02    |   3G      |
|   0       |   OC01    |   3G      |

我尝试使用https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF中的parse_url方法,但不幸的是我们的字符串不是url格式,有没有更好的方法呢?或者我必须使用regexp_extract吗?

谢谢你, Galih

1 个答案:

答案 0 :(得分:0)

您可以使用SPLIT函数和REGEXP_EXTRACT

的组合
select REGEXP_EXTRACT( skey     , ':"(\\w+)"', 1) as skey,
       REGEXP_EXTRACT( scp_id   , ':"(\\w+)"', 1) as scp_id,
       REGEXP_EXTRACT( capedge  , ':"(\\w+)"', 1) as capedge 
       from (
              select SPLIT(log_record, ',' )[0]  as skey,
                     SPLIT(log_record , ',')[1]  as scp_id,
                     SPLIT( log_record , ',')[2] as capedge 
                    FROM yourtable
              )  a;

HUE DEMO:用户ID,密码:演示,演示