在impala / hive中,如何提取字符串中特定关键字之前和之后的单词?

时间:2018-09-06 09:30:12

标签: string hive impala

我在Impala中有一个名为text的字符串列,其中包含描述。我想获得特定关键字前后的单词。

示例:

  • text =这是在海滩前的绝佳住所。面积为 m2 的50间公寓分为一间卧室。...

  • keyword = 平方米

所需结果:两列,word before = 50 word after = 公寓

有什么想法吗?

1 个答案:

答案 0 :(得分:0)

您可以使用regexp_extract来匹配m2之前和之后的单词,并分别提取它们。

with t as ( select "This is a great property right in front of the beach. The 50 m2 apartment is divided into a bedroom" as text)
select 
    regexp_extract(t.text , "(\\w+)\\s+m2", 1) as word_before,
    regexp_extract(t.text , "m2\\s+(\\w+)", 1) as word_after
from t ;

+--------------+-------------+--+
| word_before  | word_after  |
+--------------+-------------+--+
| 50           | apartment   |
+--------------+-------------+--+