如何遍历HIVE表中同一行中的列

时间:2019-05-19 02:41:14

标签: hive hiveql

我有如下要求:

  1. 我有一个包含以下字段的HIVE表:

    表:USER_PRODUCT

    user_id,product1_id,product2_id,product3_id,...,product10_id

在这里,每个user_id的实际项目可以是1到10之间的任何值(仅表示某些user_id product1_id,product2_id存在)

  1. 我想根据包含产品详细信息的另一张表进行上述处理并删除无效的物品:

    表:PRODUCT_DEAILS

    product_id,product_status

  2. 我想通过编写HIVE查询来实现这一目标。

有人可以帮助我编写查询吗?我担心的是如何为每个user_id遍历所有product_id?

对于(USER_PRODUCT中的所有行)    遍历所有product_ID从1到10)       根据PRODUCT_DEAILS中的产品状态检查产品是否有效          if(valid)->保持原样          else->通过将其设置为null从表中删除产品

1 个答案:

答案 0 :(得分:1)

如果product_deals足够小,则构建有效产品的数组,与USER_PRODUCT交叉连接,并使用array_contains检查产品是否有效:

set hive.auto.convert.join=true;
set hive.auto.convert.join.noconditionaltask=true;
set hive.mapjoin.smalltable.filesize=1000000000; --adjust to small table size
set hive.auto.convert.join.noconditionaltask=1000000000;

with valid_product as (
select collect_set(product_id) as list
 from PRODUCT_DEAILS
where product_status='valid'
sort by product_id
)

insert overwrite table USER_PRODUCT

select p.user_id,
    case when array_contains(v.list, p.product1_id) then p.product1_id end product1_id,
    case when array_contains(v.list, p.product2_id) then p.product2_id end product2_id,
    case when array_contains(v.list, p.product3_id) then p.product3_id end product3_id,
    case when array_contains(v.list, p.product4_id) then p.product4_id end product4_id,
    case when array_contains(v.list, p.product5_id) then p.product5_id end product5_id,
    case when array_contains(v.list, p.product6_id) then p.product6_id end product6_id,
    case when array_contains(v.list, p.product7_id) then p.product7_id end product7_id,
    case when array_contains(v.list, p.product8_id) then p.product8_id end product8_id,
    case when array_contains(v.list, p.product9_id) then p.product9_id end product9_id,
    case when array_contains(v.list, p.product10_id) then p.product10_id end product10_id 
  from USER_PRODUCT p
       cross join valid_product v; --cross join with single row containing array

如果PRODUCT_DEALS太大而无法放入数组,则使用普通联接:

set hive.auto.convert.join=true;
set hive.auto.convert.join.noconditionaltask=true;
set hive.mapjoin.smalltable.filesize=1000000000; --adjust to small table size
set hive.auto.convert.join.noconditionaltask=1000000000;

with valid_product as (
select distinct product_id --Get distinct IDs of valid products
 from PRODUCT_DEAILS
where product_status='valid'
)

insert overwrite table USER_PRODUCT

select p.user_id,
    case when v1.product_id is not null then p.product1_id end product1_id,
    case when v2.product_id is not null then p.product2_id end product2_id,
    case when v3.product_id is not null then p.product3_id end product3_id,
    case when v4.product_id is not null then p.product4_id end product4_id,
    case when v5.product_id is not null then p.product5_id end product5_id,
    case when v6.product_id is not null then p.product6_id end product6_id,
    case when v7.product_id is not null then p.product7_id end product7_id,
    case when v8.product_id is not null then p.product8_id end product8_id,
    case when v9.product_id is not null then p.product9_id end product9_id,
    case when v10.product_id is not null then p.product10_id end product10_id 
  from USER_PRODUCT p
       left join valid_product v1 on p.product1_id=v1.product_id 
       left join valid_product v2 on p.product2_id=v2.product_id 
       left join valid_product v3 on p.product3_id=v3.product_id 
       left join valid_product v4 on p.product4_id=v4.product_id 
       left join valid_product v5 on p.product5_id=v5.product_id 
       left join valid_product v6 on p.product6_id=v6.product_id 
       left join valid_product v7 on p.product7_id=v7.product_id 
       left join valid_product v8 on p.product8_id=v8.product_id 
       left join valid_product v9 on p.product9_id=v9.product_id 
       left join valid_product v10 on p.product10_id=v10.product_id;