Question

我试图将每个ID的表格折叠成一行，但在使用GROUP BY和CASE语句包括DATEDIFF函数时遇到了麻烦：

SELECT
o.id1
,o.id2
,count(case when o.type = 'TEST' and DATEDIFF(o.dte, m.dte) < 30 then id3 end) as win_30

FROM table1 m 
LEFT JOIN table2 0
ON (m.id = o.id2)
WHERE o.load_dt BETWEEN '20181001' AND '20181010'
GROUP BY 1,2;

运行此代码时，我不断收到'Expression not in GROUP BY'错误，而问题似乎出在datediff上（当我取出'and DATEDIFF(o.dte, m.dte) < 30'时，它只会运行精细）。我是否需要datediff中的GROUP BY？

感谢您的帮助。谢谢！

Answer 1

对于类似的查询，我没有收到任何错误。

hive> select * from test_d1;
OK
1       2       10
3       4       20
5       6       30

hive> select * from test_d2;
OK
1       5
3       10

查询-hive> select t1.id1, t1.id2, count(case when t2.id3=1 and nvl(t1.dte,t2.dte) < 10 then 1 else 0 end) as col3 from test_d1 t1 left outer join test_d2 t2 on t1.id1=t2.id3 group by 1,2;

输出-

OK
1       2       1
3       4       1
5       6       1

尝试使用分组依据而不是列的位置（您必须设置set hive.groupby.orderby.position.alias = true ）

hive> select t1.id1, t1.id2, count(case when t2.id3=1 and nvl(t1.dte,t2.dte) < 10 then 1 else 0 end) as col3 from test_d1 t1 left outer join test_d2 t2 on t1.id1=t2.id3 group by 1,2;
OK
1       2       1
3       4       1
5       6       1

另一个观察-选择列表中的列位于表的右侧时，为什么要进行左外部联接

Datediff与CASE和分组依据

1 个答案: