在WHERE谓词变通方法中配置多个子查询

时间:2018-07-04 05:28:56

标签: sql hadoop select hive subquery

我有三个表,我想使用基于table3table1的条件来查询table2。这是数据和查询的简化版本:

CREATE TABLE table1 (
  id int
);

INSERT INTO table1 VALUES(1);
INSERT INTO table1 VALUES(2);
INSERT INTO table1 VALUES(3);

+------------+--+
| table1.id  |
+------------+--+
| 1          |
| 2          |
| 3          |
+------------+--+

CREATE TABLE table2 (
  code varchar(10)
);

INSERT INTO table2 VALUES('a');
INSERT INTO table2 VALUES('b');
INSERT INTO table2 VALUES('c');

+--------------+--+
| table2.code  |
+--------------+--+
| a            |
| b            |
| c            |
+--------------+--+

CREATE TABLE table3 (
  id int,
  code varchar(10)
);

INSERT INTO table3 VALUES(1,'d');
INSERT INTO table3 VALUES(1,'a');
INSERT INTO table3 VALUES(2,'b');
INSERT INTO table3 VALUES(2,'e');
INSERT INTO table3 VALUES(4,'a');
INSERT INTO table3 VALUES(4,'d');

+------------+--------------+--+
| table3.id  | table3.code  |
+------------+--------------+--+
| 1          | d            |
| 1          | a            |
| 2          | b            |
| 2          | e            |
| 4          | a            |
| 4          | d            |
+------------+--------------+--+

基本上,仅当table3存在于id中且table1不存在于code中时,我才喜欢从table2获取记录。所以结果应该只是

1,d
2,e

以下查询无效:

SELECT * FROM table3 WHERE (table3.id IN (SELECT table1.id FROM
table1)) AND NOT (table3.code IN (SELECT table2.code FROM table2));

我收到此错误:

  

错误:编译语句时出错:失败:SemanticException   [错误10249]:行1:94不支持的子查询表达式“代码”:仅   1支持SubQuery表达式。 (状态= 42000,代码= 10249)

每个条件都可以独立运行:

SELECT * FROM table3 WHERE (table3.id IN (SELECT table1.id FROM table1));

+------------+--------------+--+
| table3.id  | table3.code  |
+------------+--------------+--+
| 1          | d            |
| 1          | a            |
| 2          | b            |
| 2          | e            |
+------------+--------------+--+

SELECT * FROM table3 WHERE NOT (table3.code IN (SELECT table2.code FROM table2));

+------------+--------------+--+
| table3.id  | table3.code  |
+------------+--------------+--+
| 1          | d            |
| 2          | e            |
| 4          | d            |
+------------+--------------+--+

重要:我无法执行JOIN或修改FROM中的任何内容,因为这是报告系统的一部分,所以我唯一能做的就是调整WHERE子句

3 个答案:

答案 0 :(得分:2)

您可以使用JOIN s重写它:

SELECT DISTINCT t3.*
FROM table3 t3
JOIN table1 t1
  ON t3.id = t1.id
LEFT JOIN table2 t2
  ON t2.code = t3.code
WHERE t2.code IS NULL;

DBFiddle Demo


  

我唯一能做的就是调整WHERE子句。

SELECT *
FROM Table3 t
WHERE EXISTS (SELECT 1
              FROM table3 t3
              JOIN table1 t1
                ON t3.id = t1.id
              LEFT JOIN table2 t2
                ON t2.code = t3.code
             WHERE t2.code IS NULL
               AND t3.id = t.id
               AND t3.code = t.code)

DBFiddle Demo2

答案 1 :(得分:1)

您可以使用的一个肮脏技巧是交叉连接table1table2,因为您根本不关心它们之间的关系,请使用exists条件:

SELECT *
FROM   table3 
WHERE  NOT EXISTS (SELECT     *
                   FROM       table1
                   CROSS JOIN table2
                   WHERE      table3.id = table1.id ON table3.code = table2.code)

编辑:
尽管上面的查询应该可以工作,但是它的性能可能不会很好。一种更快的变体是在子查询中使用union all

SELECT *
FROM   table3 
WHERE  NOT EXISTS (SELECT     *
                   FROM       table1
                   WHERE      table3.id = table1.id 
                   UNION ALL
                   SELECT     *
                   FROM       table2
                   WHERE      table3.code = table2.code)

答案 2 :(得分:0)

NOT IN是一种编写查询的简单方法:

SELECT t3.*
FROM table3 t3
WHERE t3.id IN (SELECT table1.id FROM table1) AND
      t3.code NOT IN (SELECT table2.code FROM table2);

如果您仅限于一个子查询,这将变得很棘手。这是一种可能性,尽管我不确定蜂巢是否会接受它:

where exists (select 1
              from table1 t1
              where t1.id = t3.id and
                    not exists (select 1
                                from table2 t2
                                where t2.code = t3.code
                               )
             )

您可以在没有双重嵌套的情况下执行此操作:

where exists (select 1
              from table1 t1 left join
                   table2 t2
                   on t2.code = t3.code
              where t1.id = t3.id
             )