我有一个HQL查询,当我在dbeaver中针对我的Hadoop实例运行它时,它运行良好(数据库/表名称已删除)
select * from (select DISTINCT UPPER(CONCAT(CONCAT(trim(lm.OriginCity),', '),trim(lm.OriginState))) as OriginCitySt
from <db1>.<table1> lm
LEFT JOIN <db2>.<table2> lt on trim(split(lt.lane, '-')[0]) = UPPER(CONCAT(CONCAT(trim(lm.OriginCity),', '),trim(lm.OriginState)))
WHERE lm.origincountry = 'US'
AND lt.lane IS NULL) a
union all
select * from (select distinct UPPER(CONCAT(CONCAT(trim(lm.DestinationCity),', '),trim(lm.DestinationState))) as DestCitySt
from <db1>.<table1> lm
LEFT JOIN <db2>.<table2> lt on trim(split(lt.lane, '-')[1]) = UPPER(CONCAT(CONCAT(trim(lm.DestinationCity),', '),trim(lm.DestinationState)))
WHERE lm.origincountry = 'US'
AND lt.lane IS NULL) b
我在linux盒子上有一个应用程序,该应用程序使用pyspark连接到hive并运行此查询,但是当我这样做时,它被卡在看起来像这样的行上。
当我从查询中删除“左联接”并使其满足以下条件
select * from (select DISTINCT UPPER(CONCAT(CONCAT(trim(lm.OriginCity),', '),trim(lm.OriginState))) as OriginCitySt
from <db1>.<table1> lm
WHERE lm.origincountry = 'US') a
union all
select * from (select distinct UPPER(CONCAT(CONCAT(trim(lm.DestinationCity),', '),trim(lm.DestinationState))) as DestCitySt
from <db1>.<table1> lm
WHERE lm.origincountry = 'US') b
运行正常。所以我知道联接是问题所在,而且我很确定它是“ trim(split(lt.lane,'-')[0])”部分,但是问题是为什么?