我正在尝试使用SQLContext在pyspark中联接两个表:
create table joined_table stored
as orc
as
SELECT A.*,
B.*
FROM TABLEA AS A
LEFT JOIN TABLEB AS B ON 1=1
where lower(A.varA) LIKE concat('%',lower(B.varB),'%')
AND (B.varC = 0 OR (lower(A.varA) = lower(B.varB)));
但是出现以下错误:
AnalysisException: u'Detected cartesian product for LEFT OUTER join between logical plans
parquet\nJoin condition is missing or trivial.\nUse the CROSS JOIN syntax to allow cartesian products between these relations.;
编辑:
我在Spark中使用以下代码解决了这个问题:
conf.set('spark.sql.crossJoin.enabled', 'true')
这将在Pyspark中启用交叉连接!
答案 0 :(得分:0)
使用左联接无法看到on子句条件。没有联接条件的左联接将始终导致交叉联接。交叉连接将为左侧表的每一行重复左侧表的每一行。您可以编辑查询并在连接键列中包含“ ON”子句吗?