如何在Hive上使用REGEX_EXTRACT提取字符串的数字前缀?

时间:2017-01-31 21:34:28

标签: sql regex hive

我不确定如何在Hive上编写我的regex命令从该字符串中提取数字前缀substring:211118-1_20569 - (DHCP)。我需要返回211118,但也可以灵活地返回具有更小或更大值的数字,具体取决于数字前缀的大小。

1 个答案:

答案 0 :(得分:1)

hive> select regexp_extract('211118-1_20569 - (DHCP)','^\\d+',0);
OK
211118

hive> select regexp_extract('211118-1_20569 - (DHCP)','^[0-9]+',0);
OK
211118
^     - The beginning of a line
\d    - A digit: [0-9]
[0-9] - the characters between '0' and '9'
X+    - X, one or more times

https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html

regexp_extract(string subject, string pattern, int index)
  • 预定义的字符类(例如\d)前面应加上额外的反斜杠(\\d
  • index = 0匹配整个模式

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-StringOperators