Question

我使用split函数在Hive中创建一个数组，如何从数组中获取前n个元素，并且我想通过子数组

代码示例

select col1 from table
where split(col2, ',')[0:5]

＆＃39; [0：5]＆＃39;看起来像蟒蛇风格，但它在这里不起作用。

Answer 1

这是一种更简单的方法。有一个名为TruncateArrayUDF.java的UDF here可以满足您的要求。只需从主页面克隆repo并使用Maven构建jar。

示例数据：

|       col1         |
----------------------
  1,2,3,4,5,6,7
  11,12,13,14,15,16,17

<强>查询：

add jar /complete/path/to/jar/brickhouse-0.7.0-SNAPSHOT.jar;
create temporary function trunc as 'brickhouse.udf.collect.TruncateArrayUDF';

select pos
      ,newcol
from (
      select trunc(split(col1, '\\,'), 5) as p
      from table
     ) x
lateral view posexplode(p) explodetable as pos, newcol

<强>输出：

  pos  |  newcol  |
-------------------
  0         1
  1         2
  2         3
  3         4
  4         5
  0         11
  1         12
  2         13
  3         14
  4         15

Answer 2

这是一个棘手的问题首先抓住here的砖箱然后将其添加到Hive：add jar /path/to/jars/brickhouse-0.7.0-SNAPSHOT.jar;

现在创建我们将要使用的两个函数：

CREATE TEMPORARY FUNCTION array_index AS 'brickhouse.udf.collect.ArrayIndexUDF';
CREATE TEMPORARY FUNCTION numeric_range AS 'brickhouse.udf.collect.NumericRange';

查询将是：

select a, n as array_index, array_index(split(a,','),n) as value_from_Array from ( select "abc#1,def#2,hij#3" a from dual union all select "abc#1,def#2,hij#3,zzz#4" a from dual) t1 lateral view numeric_range( length(a)-length(regexp_replace(a,',',''))+1 ) n1 as n

解释：
select "abc#1,def#2,hij#3" a from dual union all select "abc#1,def#2,hij#3,zzz#4" a from dual

只是选择一些测试数据，在您的情况下将其替换为您的表名。

lateral view numeric_range( length(a)-length(regexp_replace(a,',',''))+1 ) n1 as n

numeric_range是一个返回给定范围的表的UDTF，在这种情况下，我要求的范围是0（默认值）和字符串中的元素数（以逗号数量计算+ 1）
这样，每行将乘以给定列中的元素数。

array_index(split(a,','),n)

这与使用split(a,',')[n]完全相同，但是hive不支持它因此，我们得到初始字符串的每个重复行的第n个元素，结果是：

abc#1,def#2,hij#3,zzz#4 0 abc#1 abc#1,def#2,hij#3,zzz#4 1 def#2 abc#1,def#2,hij#3,zzz#4 2 hij#3 abc#1,def#2,hij#3,zzz#4 3 zzz#4 abc#1,def#2,hij#3 0 abc#1 abc#1,def#2,hij#3 1 def#2 abc#1,def#2,hij#3 2 hij#3

如果你真的想要特定数量的元素（比如5），那么只需使用：
lateral view numeric_range(5 ) n1 as n

如何在Hive中获取数组中的前n个元素

2 个答案: