Question

我正在尝试将以下Teradata SQL转换为Spark SQL，但无法进行。有人可以提出解决方案吗？

create multiset table test1 as 
(
   WITH RECURSIVE test1 (col1, col2, col3) AS 
   (
      sel col11, col2, col3 
   from
      test2 root 
   where
      col3 = 1 
   UNION ALL
   SELECT
      indirect.col11,
      indirect.col2 || ',' || direct.col2 as col2,
      indirect.col3 
   FROM
      test1 direct,
      test2 indirect 
   WHERE
      direct.col1 = indirect.col11 
      and direct.col3 + 1 = indirect.col3 
   )
   sel col1 as col11,
   col2 
from
   test1 QUALIFY ROW_NUMBER() OVER(PARTITION BY col1 
ORDER BY
   col3 DESC) = 1 
)
with data primary index (col11) ;

谢谢。

Answer 1

一段时间前，我按照http://sqlandhadoop.com/how-to-implement-recursive-queries-in-spark/所述尝试了这种方法。

我找不到我的简化版本，但是这种方法是目前唯一的简化方法。我假设将来会为此添加Spark SQL支持-尽管???

进一步说明：我已经看到了自己需要使用这种while循环方法来开发KPI的要求。我建议不要将递归SQL以及用于KPI生成的while循环视为Spark的用例，因此应在完全符合ANSI的数据库中完成，并将结果压缩到Hadoop中（如果需要）。

如何将Teradata递归查询转换为Spark SQL

1 个答案: