Databricks SQL:为什么左联接中的子查询导致错误味精

时间:2018-07-16 21:51:32

标签: databricks

我正在尝试在左联接条件下使用子查询,但是收到一条错误消息,内容为:"Error in SQL statement: AnalysisException: Table or view not found: TableD;",并指向子查询中的FROM TableD D2语句。

SELECT D1.Code, D1.Description, C.InstanceKey
FROM TableA A
INNER JOIN TableB B
    ON A.Key = B.Key
INNER JOIN TableC C
    ON B.DetailKey = C.DetailKey
LEFT JOIN TableD D1
    ON C.InstanceKey = D1.InstanceKey
    AND D1.RankCnt = (SELECT MIN(D2.RankCnt) 
                      FROM TableD D2
                      WHERE C.InstanceKey = D2.InstanceKey); 

如果我删除子查询和硬编码D1.RankCnt = [anyValidRankCnt],则查询运行不会出现问题。

此问题也已发布在https://forums.databricks.com/questions/14588/why-is-subquery-in-left-join-causing-error-msg.html的Databricks社区论坛上。

1 个答案:

答案 0 :(得分:1)

我不确定目前Spark中是否支持该特定类型的correlated subquery,尽管我能够以几种不同的方式重写它,包括使用ROW_NUMBER。请检查这些查询在语义上是否等同于您的数据:

%sql
-- Rewrite 1: CTE
WITH cte AS
(
SELECT D1.Code, D1.Description, C.InstanceKey, ROW_NUMBER() OVER ( PARTITION BY c.InstanceKey ORDER BY D1.RankCnt ) xrank
FROM TableA A
INNER JOIN TableB B
    ON A.Key = B.Key
INNER JOIN TableC C
    ON B.DetailKey = C.DetailKey
LEFT JOIN TableD D1
    ON C.InstanceKey = D1.InstanceKey
)
SELECT *
FROM cte
WHERE xrank = 1


-- Rewrite 2: subquery
SELECT x.Code, x.Description, C.InstanceKey
FROM TableA A
INNER JOIN TableB B
    ON A.Key = B.Key
INNER JOIN TableC C
    ON B.DetailKey = C.DetailKey
LEFT JOIN 
    (
    SELECT D1.InstanceKey, D1.Code, D1.Description, D1.RankCnt
    FROM TableD D1
        INNER JOIN
            ( 
            SELECT InstanceKey, MIN(RankCnt) RankCnt
            FROM TableD 
            GROUP BY InstanceKey
            ) D2 ON D1.InstanceKey = D2.InstanceKey
            AND D1.RankCnt = D2.RankCnt
    ) x
    ON c.InstanceKey = x.InstanceKey;


-- Rewrite 3: UNION ALL
SELECT D1.Code, D1.Description, C.InstanceKey
FROM TableA A
INNER JOIN TableB B
    ON A.Key = B.Key
INNER JOIN TableC C
    ON B.DetailKey = C.DetailKey
INNER JOIN TableD D1
    ON C.InstanceKey = D1.InstanceKey
    INNER JOIN
        (
        SELECT D2.InstanceKey, MIN(D2.RankCnt) RankCnt
        FROM TableD D2
        GROUP BY D2.InstanceKey
        ) x ON C.InstanceKey = x.InstanceKey
        AND D1.RankCnt = x.RankCnt

UNION ALL

SELECT NULL AS Code, NULL AS Description, C.InstanceKey
FROM TableA A
INNER JOIN TableB B
    ON A.Key = B.Key
INNER JOIN TableC C
    ON B.DetailKey = C.DetailKey
WHERE NOT EXISTS
    (
    SELECT *
    FROM TableD D1
    WHERE C.InstanceKey = D1.InstanceKey
    );