我在Impala中有一个名为text
的字符串列,其中包含描述。我想获得特定关键字前后的单词。
示例:
text
=这是在海滩前的绝佳住所。面积为 m2 的50间公寓分为一间卧室。...
keyword
= 平方米
所需结果:两列,word before
= 50 和word after
= 公寓
有什么想法吗?
答案 0 :(得分:0)
您可以使用regexp_extract
来匹配m2
之前和之后的单词,并分别提取它们。
with t as ( select "This is a great property right in front of the beach. The 50 m2 apartment is divided into a bedroom" as text)
select
regexp_extract(t.text , "(\\w+)\\s+m2", 1) as word_before,
regexp_extract(t.text , "m2\\s+(\\w+)", 1) as word_after
from t ;
+--------------+-------------+--+
| word_before | word_after |
+--------------+-------------+--+
| 50 | apartment |
+--------------+-------------+--+