我有一个包含结构数组的表。有没有办法使用like运算符来过滤此列中的记录?
hive> desc location;
location_list array<struct<city:string,state:string>>
hive> select * from location;
row1 : [{"city":"Hudson","state":"NY"},{"city":"San Jose","state":"CA"},{"city":"Albany","state":"NY"}]
row2 : [{"city":"San Jose","state":"CA"},{"city":"San Diego","state":"CA"}]
我正在尝试运行这样的查询,只过滤那些具有“NY”状态的记录。
hive> select * from location where location_list like '%"NY"%';
FAILED: SemanticException [Error 10014]: Line 1:29 Wrong arguments ''%"NY"%'': No matching method for class org.apache.hadoop.hive.ql.udf.UDFLike with (array<struct<city:string,state:string>>, string). Possible choices: _FUNC_(string, string)
注意:我可以通过做侧视图来做到这一点。爆炸这个结构列。但是试图避免它,因为我需要将这个表与另一个不接受横向视图的表联系起来。
答案 0 :(得分:1)
不错的问题,你可以用以下有效(和漂亮)的方式来做。
select * from location
where array_contains(location_list.state, 'NY');
在这种情况下,location_list.state
将创建一个字符串数组(在您的情况下为状态),因此您可以使用UDF array_contains
进行值检查。这将寻找准确的值,您将无法像like
运算符那样执行匹配,但您应该能够实现您正在寻找的内容
答案 1 :(得分:1)
array_contains
的演示:
select my_array
from
( --emulation of your dataset. Just replace this subquery with your table
select array(named_struct("city","Hudson","state","NY"),named_struct("city","San Jose","state","CA"),named_struct("city","Albany","state","NY")) as my_array
union all
select array(named_struct("city","San Jose","state","CA"),named_struct("city","San Diego","state","CA")) as my_array
)s
where array_contains(my_array.state,'NY')
;
结果:
OK
[{"city":"Hudson","state":"NY"},{"city":"San Jose","state":"CA"},{"city":"Albany","state":"NY"}]
Time taken: 34.055 seconds, Fetched: 1 row(s)