使用Oracle Regexp从字段中提取电子邮件

时间:2014-01-22 14:54:26

标签: sql regex oracle

我想在一个字段中收到电子邮件形成的文本。我在下面尝试过sql但没有运气。见SqlFiddle。从regexp中删除^和$也不起作用。

WITH TEST_DATA AS (
  SELECT 'foo@gmail.com' AS EMAIL FROM DUAL UNION ALL 
  SELECT 'mail foo@gmail.com' FROM DUAL UNION ALL           
  SELECT 'mail foo@gmail.com sent' FROM DUAL UNION ALL                
  SELECT 'foo@gmail.com sent count 23' FROM DUAL UNION ALL          
  SELECT 'mail already sent to foo@gmail.com and foo@hotmail.com' FROM DUAL UNION ALL                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   
  SELECT 'foo@gmail.com sent count 23' FROM DUAL             
)SELECT REGEXP_SUBSTR(EMAIL,'^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,4}$') MAIL
 FROM TEST_DATA;

此数据集的预期输出

foo@gmail.com 
foo@gmail.com 
foo@gmail.com 
foo@gmail.com 
foo@gmail.com, foo@hotmail.com 
foo@gmail.com

任何帮助表示感谢。

2 个答案:

答案 0 :(得分:5)

如果要在单个列中提取多个邮件ID,可以使用REGEXP_REPLACE函数。

假设数据中的所有ID都是有效的,

REGEXP_REPLACE (EMAIL, '(\w+@\w+\.\w+ ?)|(.)', '\1')

这将删除除了至少由空格分隔的邮件ID之外的所有其他文本。

然后,您可以删除任何尾随空格并添加逗号以分隔多个ID。

REPLACE (TRIM (REGEXP_REPLACE (EMAIL, '(\w+@\w+\.\w+ ?)|(.)', '\1')),
            ' ',
            ', ')

示例:

WITH TEST_DATA
     AS (SELECT 'foo@gmail.com' AS EMAIL FROM DUAL
         UNION ALL
         SELECT 'mail foo@gmail.com' FROM DUAL
         UNION ALL
         SELECT 'mail foo@gmail.com sent to 123@zxc.com and qwe@rt.com' FROM DUAL
         UNION ALL
         SELECT 'foo@gmail.com sent count 23 and asd@qwert.edu' FROM DUAL
         UNION ALL
         SELECT 'mail already sent to foo@gmail.com and foo@hotmail.com' FROM DUAL
         UNION ALL
         SELECT 'foo@gmail.com sent count 23' FROM DUAL)
SELECT REPLACE (TRIM (REGEXP_REPLACE (EMAIL, '(\w+@\w+\.\w+ ?)|(.)', '\1')),
                ' ',
                ', ')
          MAIL
  FROM TEST_DATA;

MAIL
-----------------------------
foo@gmail.com
foo@gmail.com
foo@gmail.com, 123@zxc.com, qwe@rt.com
foo@gmail.com, asd@qwert.edu
foo@gmail.com, foo@hotmail.com
foo@gmail.com

答案 1 :(得分:2)

你很亲密! 试试这个

SELECT REGEXP_SUBSTR(EMAIL,'[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,4}') MAIL

编辑:

也许这会有所帮助:

WITH TEST_DATA AS (
  SELECT 'foo@gmail.com' AS EMAIL FROM DUAL UNION ALL 
  SELECT 'mail foo@gmail.com' FROM DUAL UNION ALL           
  SELECT 'mail foo@gmail.com sent' FROM DUAL UNION ALL                
  SELECT 'foo@gmail.com sent count 23' FROM DUAL UNION ALL          
  SELECT 'mail already sent to foo@gmail.com and foo@hotmail.com' FROM DUAL UNION ALL 
  SELECT 'foo@gmail.com sent count 23' FROM DUAL             
)SELECT REGEXP_SUBSTR(EMAIL,'[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,4}') MAIL,
        REGEXP_SUBSTR(EMAIL,'[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,4}',1,2) MAIL2
 FROM TEST_DATA

我没有办法报告'n'个匹配项。我也没有意识到如何将逗号和输出插入一列。我敢打赌,如果可能的话,查询将变得非常复杂,多个内部选择/发现/替换发生。更好的解决方案可能是将原始结果返回到另一种语言进行解析或使用pl / sql执行此类解析。

另一个编辑:

这就是我对内部选择的意思。问题的确切解决方案: - )

select CASE WHEN MAIL2 is not null THEN mail||', '||mail2 ELSE mail END as mail
from (
    WITH TEST_DATA AS (
      SELECT 'foo@gmail.com' AS EMAIL FROM DUAL UNION ALL 
      SELECT 'mail foo@gmail.com' FROM DUAL UNION ALL           
      SELECT 'mail foo@gmail.com sent' FROM DUAL UNION ALL                
      SELECT 'foo@gmail.com sent count 23' FROM DUAL UNION ALL          
      SELECT 'mail already sent to foo@gmail.com and foo@hotmail.com' FROM DUAL UNION ALL 
      SELECT 'foo@gmail.com sent count 23' FROM DUAL             
    )SELECT REGEXP_SUBSTR(EMAIL,'[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,4}') MAIL,
            REGEXP_SUBSTR(EMAIL,'[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,4}',1,2) MAIL2
     FROM TEST_DATA
)

我还讨论了这个在第8点讨论电子邮件匹配的Oracle语言。可能值得一看。 http://www.orafaq.com/node/2404