Question

我要将 SHARK 查询转移到 SPARK 。

以下是我的示例 SHARK 查询，它使用group by子句中的函数。

select month(dt_cr) as Month,
   day(dt_cr)   as date_of_created,
   count(distinct phone_number) as total_customers        
from customer
group by month(dt_cr),day(dt_cr);

这个查询在 SPARK sql中不起作用，它给出了以下错误;

错误： org.apache.spark.sql.catalyst.errors.package $ TreeNodeException：表达式不在GROUP BY中。

因此，作为我在SPARK查询下使用的解决方案的一部分，这是有效但需要更改代码。这对我现有的项目影响很大。所以任何人都有一个更好的解决方案，影响最小。

SELECT Month,date_of_created,count(distinct phone_number) as total_customers        
FROM
(select month(dt_cr) as Month,
    day(dt_cr)   as date_of_created,
    email
from customers)A
group by Month,date_of_created

Answer 1

Spark SQL中存在一个问题：https://issues.apache.org/jira/browse/SPARK-4296

但是，我认为它将在下一个版本中修复。目前，您必须更改代码才能绕过此问题。

SPARK - 如何通过查询在组中使用功能

1 个答案: