Question

我想通过删除所有不必要的内容来简化Hive中的gmail地址。我已经可以删除“。”使用“ translate（）”，但是gmail还允许忽略“ +”和“ @”之间的任何内容。以下正则表达式可在Teradata中工作：

select REGEXP_REPLACE('test+friends@gmail.com', '\+.+\\@' ,'\\@');

给出：“ test@gmail.com”，但是在Hive中，我得到了：

失败：SemanticException [错误10014]：行1：7错误的参数 ``\ @''：org.apache.hadoop.hive.ql.metadata.HiveException：无法执行方法public org.apache.hadoop.io.Text org.apache.hadoop.hive.ql.udf.UDFRegExpReplace.evaluate（org.apache.hadoop.io.Text，org.apache.hadoop.io.Text，org.apache.hadoop.io.Text）在对象的org.apache.hadoop.hive.ql.udf.UDFRegExpReplace@131b58d4上带有参数的org.apache.hadoop.hive.ql.udf.UDFRegExpReplace类 {test+friends@gmail.com：org.apache.hadoop.io.Text， +。+ @：org.apache.hadoop.io.Text，@：org.apache.hadoop.io.Text}，大小为3

如何在Hive中使用此正则表达式？

Answer 1

您无需在正则表达式中转义@。试试：

select REGEXP_REPLACE('test+friends@gmail.com', '\+[^@]+@' ,'@');

您还应该使用[^@]+而不是.+，以便比赛在第一个@处停止。否则，如果输入中有多个地址，则匹配项将覆盖所有地址。

Answer 2

我找到了答案：

选择REGEXP_REPLACE（'test+friends@gmail.com'，'[+]。+ @'，'@'）;

或

选择REGEXP_REPLACE（'test+friends@gmail.com'，'\ +。+ @'，'@'）;

做到了。 Teradata和Hive在处理正则表达式的方式上似乎有很大差异。

如何在Hive中使用正则表达式简化g-mail地址

2 个答案: