在Hive中使用Rlike来查找正则表达式模式

时间:2018-01-16 21:08:30

标签: hadoop hive

我想过滤一列以检查head,att,space等字词,我正在使用以下查询

select * from tablename where (column_name like '%head%' or column_name like '%att%' or column_name like '%space%')

但是查询的问题是,它甚至会滤除像头饰,态度,宇宙飞船等词。我想只过滤具有特定单词的行,如head,att,space。我试着给每个单词留一个空格,

select * from tablename where (column_name like '%head %' or column_name like '%att %' or column_name like '%space %')

但是,如果在句子末尾存在头部,则不会对该单词进行过滤。

我们可以在Hive中使用类似rlike的内容来解决此问题。但我尝试这样做没有太大的成功。

任何人都可以帮助我使用rlike仅过滤仅包含head,att,space等字词的行。

由于

添加更新..

假设输入如下

Tom's head
my head is big
I am having headache
att is bad
attitude is bad
bad is att
There is more space
spaceship
space is looking cool

输出应为,

Tom's head
my head is big
att is bad
bad is att
There is more space
space is looking cool

应删除以下行,因为我只对head,att和space等词语感兴趣,只要它出现在句子中。我对过滤头痛,态度和宇宙飞船并不感兴趣。

I am having headache
attitude is bad
spaceship

由于

2 个答案:

答案 0 :(得分:0)

RLIKE使用我们在大多数编程语言中使用的常用正则表达式语法。

^head$表示该列应该以{{1​​}}开头(以^开头表示)和结束(结尾用$表示)。{/ p>

例如,如果您要过滤以head开头并以h结尾的字词,您可以执行以下操作:d。上述问题的解决方案是:

^h.*d$

参考:Relational Operators

答案 1 :(得分:0)

单词边界将在这种情况下起作用,它会在开头,中间和结尾捕获字符串。

with aa as
(select 'Toms head' as col1
union all
select 'head as in headache' as col1
union all
select 'headache as in head' as col1
union all
select 'my head is big' as col1
union all
select 'I am having headache' as col1
union all
select 'att is bad' as col1
union all
select 'attitude is bad' as col1
union all
select 'bad is att' as col1
union all
select 'There is more space' as col1
union all
select 'spaceship' as col1
union all
select 'space is looking cool' as col1)
select col1 from aa
where regexp(col1,'\\bhead\\b|\\batt\\b|\\bspace\\b')