左联接会产生交叉联接

时间:2019-01-11 12:06:18

标签: pyspark pyspark-sql

我正在尝试使用SQLContext在pyspark中联接两个表:

create table joined_table stored 
as orc
as
SELECT  A.*,
        B.*
FROM TABLEA AS A
LEFT JOIN TABLEB AS B ON 1=1
where lower(A.varA) LIKE concat('%',lower(B.varB),'%')
AND (B.varC = 0 OR (lower(A.varA) = lower(B.varB)));

但是出现以下错误:

AnalysisException: u'Detected cartesian product for LEFT OUTER join between logical plans

parquet\nJoin condition is missing or trivial.\nUse the CROSS JOIN syntax to allow cartesian products between these relations.;

编辑:

我在Spark中使用以下代码解决了这个问题:

conf.set('spark.sql.crossJoin.enabled', 'true')

这将在Pyspark中启用交叉连接!

1 个答案:

答案 0 :(得分:0)

使用左联接无法看到on子句条件。没有联接条件的左联接将始终导致交叉联接。交叉连接将为左侧表的每一行重复左侧表的每一行。您可以编辑查询并在连接键列中包含“ ON”子句吗?