Question

 table 1 - 

    name bal map<String,String> year   

    abc 24000   {car : honda, company : boa} 2015
    ac 21000  { car:honda} 2015
    def 23000 {car:honda, company: boa} 2015
    abc 21000   {car : honda, company : boa} 2014
    ac 20000  { car:honda} 2014
    def 22000 {car:honda, company: boa} 2014

        Required Output after self join -  

        name bal-difference map<String,String> 
        abc 3000 {car : honda, company : boa} 
        ac 1000 { car:honda} 
        def 1000 {car:honda, company: boa} 



    select 
    t1.name,t1.mapColumn,(t1.bal-t2.bal)
    FROM  table1 t1 JOIN table1 t2 
    ON t1.mapColumn = t2.mapColumn and t1.name = t2.name

我想在地图列上执行自联接，在hive中执行名称。因此，我可以执行示例输出中显示的平衡差异。

我尝试了加入，但它没有提供必需的列。我想了解join如何在复杂的数据类型上工作（在我的案例图中）。

Answer 1

检查一下 https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LateralView

您必须使用LATER VIEW EXPLODE语法来公开嵌套地图并在JOIN中使用它。

Answer 2

截至目前，在hive中，如果您事先知道mapcolumn的所有键，我们就无法加入map数据类型。您可以键入所有键，如t1.mapColumn ['car'] = t2.mapColumn ['car']等。否则请尝试使用以下内容：

select t1.name ,t1.mapColumn ,(t1.bal - t2.bal)
FROM (select * from table1 where tran_year = '2015') t1 
JOIN (select * from table1 where tran_year = '2014') t2 
ON    map_keys(t1.mapColumn) <=> map_keys(t2.mapColumn)  
and   map_values(t1.mapColumn) <=> map_values(t2.mapColumn)
and   t1.name = t2.name;

你可以使用横向视图分解成多行，但是你很难找到bal之间的区别，因为有多行具有相同的bal。

对于非空操作数，

<=>运算符将返回与EQUAL =运算符相同的结果，但如果两者都为NULL，则返回TRUE，如果其中一个为NULL，则返回FALSE

Hive - 加入地图数据类型列

2 个答案: