将N个数组合并为一个?

时间:2019-10-10 07:54:31

标签: mysql arrays hive hiveql

我有一个名为connection的表,该表存储property_id及其现有的API连接。一个属性可以在同一apis上具有多个connection_id。一个属性也可以具有多个connection_idapis的重要性顺序是递增的,因此API 1的重要性高于API 14。

考虑到上述情况,我正在尝试选择单个connection_id per 属性 per 天。鉴于以下数据:

+------------+-------------+---------------+----------------+
| yyyy_mm_dd | property_id | connection_id |      apis      |
+------------+-------------+---------------+----------------+
| 2019-10-01 |         100 |         123   | ['8']          |
| 2019-10-01 |         100 |         200   | ['16']         |
| 2019-10-01 |         100 |           5   | ['1','2','14'] |
+------------+-------------+---------------+----------------+

我希望返回以下内容(因为connection_id 5拥有最低的API连接):

+------------+-------------+---------------+
| yyyy_mm_dd | property_id | connection_id |
+------------+-------------+---------------+
| 2019-10-01 |         100 |           5   |
+------------+-------------+---------------+

我当时想实现这一目标,我可以合并数组,然后对它们进行升序排序,然后再选择索引0处的项目。但是,我觉得这可能会使它变得过于复杂。

在集合函数下,我看不到任何合并函数-https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-CollectionFunctions。也许不合并就可以实现?

1 个答案:

答案 0 :(得分:3)

如果您需要具有最低API的connection_id,则可以对数组进行排序,然后以最低API [0]进行记录

select yyyy_mm_dd, property_id, connection_id
  from
(
 select yyyy_mm_dd, property_id, connection_id, apis,
        row_number() over(partition by yyyy_mm_dd, property_id order by api0 ) rn
   from
      (
       select yyyy_mm_dd, property_id, connection_id, apis, sort_array(apis)[0] as api0
         from mytable
      )s
)s 
where rn=1;

如果array是字符串,而不是整数,那么它将不能与sort一起使用,您可以展开array,将其强制转换为int并使用最低的API进行记录:

select yyyy_mm_dd, property_id, connection_id
  from
(
 select yyyy_mm_dd, property_id, connection_id, apis,
        row_number() over(partition by yyyy_mm_dd, property_id order by api ) rn
   from
      (
       select t.yyyy_mm_dd, t.property_id, t.connection_id, t.apis, cast(e.api as int) as api
         from mytable t
              lateral view explode(apis) e as api
      )s
)s 
where rn=1;