我正在尝试将以下MySQL查询转换为Hive
MySQL查询
SELECT
departments.dept_name,
dept_emp.dept_no,
gender,
(count(*)/(select count(*) from employees)) AS Sex
FROM
employees,
dept_emp,departments
WHERE
dept_emp.dept_no = departments.dept_no
AND dept_emp.emp_no = employees.emp_no
GROUP BY
dept_emp.dept_no,
departments.dept_name,
gender
ORDER BY
dept_emp.dept_no;
配置查询
WITH
q1 as (SELECT COUNT(*) AS TOTAL_COUNT FROM employees),
q2 as (SELECT gender,COUNT(*) as gender_count FROM employees GROUP BY gender)
SELECT
departments.dept_name,
dept_emp.dept_no,
gender,
gender_count/TOTAL_COUNT As Sex
FROM
q1,
q2,
dept_emp,
departments
WHERE
dept_emp.dept_no = departments.dept_no
AND dept_emp.emp_no = dept_emp.emp_no
GROUP BY
dept_emp.dept_no,
departments.dept_name,
q2.gender
ORDER BY
dept_emp.dept_no;
但是我遇到了错误
SemanticException [错误10025]:行3:53表达式不在
GROUP BY
键中:TOTAL_COUNT
先谢谢您!
答案 0 :(得分:1)
除了GROUP BY
子句中缺少未聚合列的错误之外,新查询中的逻辑似乎与旧查询中的逻辑不同(例如:子查询q2
计算出一些新值) ...并且与其他表没有连接条件。
Hive在SELECT
子句中不支持子查询,但在does allow them in FROM
and WHERE
clauses中支持子查询。我只是将内联子查询移至FROM
子句。由于它仅返回一条记录,因此将是CROSS JOIN
:
SELECT
d.dept_name,
de.dept_no,
e.gender,
(count(*)/x.cnt) AS Sex
FROM
employees e
INNER JOIN dept_emp de ON de.emp_no = e.emp_no
INNER JOIN departments d ON de.dept_no = d.dept_no
CROSS JOIN (SELECT COUNT(*) cnt FROM employees) x
GROUP BY
de.dept_no,
d.dept_name,
e.gender
ORDER BY
de.dept_no;
NB1:始终使用标准JOIN
的 explicit ,而不是旧的隐式 JOIN
s。我相应地修改了查询(并添加了表别名)。
答案 1 :(得分:0)
实际上,通过对不在组中的列使用MAX()
聚合或将其添加到group by
,可以轻松地解决查询中的异常。
我完全同意@GMB关于显式联接的观点,并且还想补充一点,您可以使用分析employee
消除交叉联接和count()
表额外扫描:
SELECT
d.dept_name,
de.dept_no,
e.gender,
count(*)/max(e.total_cnt) as Sex
FROM
(select emp_no, gender,
count(*) over() as total_cnt
from employees e ) e
INNER JOIN dept_emp de ON de.emp_no = e.emp_no
INNER JOIN departments d ON de.dept_no = d.dept_no
GROUP BY
de.dept_no,
d.dept_name,
e.gender
ORDER BY
de.dept_no;