使用MySQL / Presto提取给定开始和结束模式的字符串

时间:2019-07-11 21:01:14

标签: mysql hive presto

尝试从给定特定开始和结束模式的字符串中提取文本。

真的不知道从哪里开始。我四处张望,试图从正则表达式功能中脱颖而出,但它们让我望而却步。

表格:

+----+------------------------------------+
| id |              sentence              |
+----+------------------------------------+
|  1 | Hello, I am a bird.                |
|  2 | Hello, I am a cat. I like catfood. |
|  3 | Hello, I am a dog. I like bones.   |
+----+------------------------------------+

尝试提取Hello,.之间的文本

输出:

+-------------+
|  sentence   |
+-------------+
| I am a bird |
| I am a cat  |
| I am a dog  |
+-------------+

1 个答案:

答案 0 :(得分:2)

在蜂巢中尝试使用regexp_extract(col,regexp,capture_group)功能:

Hello,    //match "Hello," literal
([^.]*)  //then until first occurrence of .(period) capture as first group

示例:

hive> select regexp_extract(sentence,"Hello,([^.]*)",1)sentence from( 
          --preparing sample data
           select stack(3,'Hello, I am a bird.','Hello, I am a cat. I like catfood.','Hello, I am a dog. I like bones.')
              as(sentence))t;

结果:

sentence
 I am a bird
 I am a cat
 I am a dog