将MySQL查询转换为Hive

时间:2019-03-16 21:10:55

标签: mysql sql hive

我正在尝试将以下MySQL查询转换为Hive

MySQL查询

SELECT
    departments.dept_name,
    dept_emp.dept_no,
    gender,
    (count(*)/(select count(*) from employees)) AS Sex
FROM 
    employees,
    dept_emp,departments
WHERE 
    dept_emp.dept_no = departments.dept_no
    AND dept_emp.emp_no =  employees.emp_no
GROUP BY 
    dept_emp.dept_no, 
    departments.dept_name,
    gender
ORDER BY 
    dept_emp.dept_no;

配置查询

WITH 
    q1 as (SELECT COUNT(*) AS TOTAL_COUNT FROM employees),
    q2 as (SELECT gender,COUNT(*) as gender_count FROM employees GROUP BY gender)
SELECT 
    departments.dept_name,
    dept_emp.dept_no,
    gender,
    gender_count/TOTAL_COUNT As Sex 
FROM 
    q1,
    q2,
    dept_emp,
    departments
WHERE 
    dept_emp.dept_no = departments.dept_no
    AND dept_emp.emp_no = dept_emp.emp_no
GROUP BY 
    dept_emp.dept_no, 
    departments.dept_name,
    q2.gender
ORDER BY 
    dept_emp.dept_no;

但是我遇到了错误

  

SemanticException [错误10025]:行3:53表达式不在GROUP BY键中:TOTAL_COUNT

先谢谢您!

2 个答案:

答案 0 :(得分:1)

除了GROUP BY子句中缺少未聚合列的错误之外,新查询中的逻辑似乎与旧查询中的逻辑不同(例如:子查询q2计算出一些新值) ...并且与其他表没有连接条件。

Hive在SELECT子句中不支持子查询,但在does allow them in FROM and WHERE clauses中支持子查询。我只是将内联子查询移至FROM子句。由于它仅返回一条记录,因此将是CROSS JOIN

SELECT
    d.dept_name,
    de.dept_no,
    e.gender,
    (count(*)/x.cnt) AS Sex
FROM 
    employees e
    INNER JOIN dept_emp de ON de.emp_no =  e.emp_no
    INNER JOIN departments d ON de.dept_no = d.dept_no
    CROSS JOIN (SELECT COUNT(*) cnt FROM employees) x
GROUP BY 
    de.dept_no, 
    d.dept_name,
    e.gender
ORDER BY 
    de.dept_no;

NB1:始终使用标准JOIN explicit ,而不是旧的隐式 JOIN s。我相应地修改了查询(并添加了表别名)。

答案 1 :(得分:0)

实际上,通过对不在组中的列使用MAX()聚合或将其添加到group by,可以轻松地解决查询中的异常。 我完全同意@GMB关于显式联接的观点,并且还想补充一点,您可以使用分析employee消除交叉联接和count()表额外扫描:

SELECT
    d.dept_name,
    de.dept_no,
    e.gender,
    count(*)/max(e.total_cnt)  as Sex
FROM 
    (select emp_no, gender, 
            count(*) over() as total_cnt
       from employees e ) e
    INNER JOIN dept_emp de ON de.emp_no =  e.emp_no
    INNER JOIN departments d ON de.dept_no = d.dept_no
GROUP BY 
    de.dept_no, 
    d.dept_name,
    e.gender
ORDER BY 
    de.dept_no;