使用1个表进行Impala SQL查询并查找3个主机名的公用名

时间:2019-02-08 15:20:01

标签: sql impala

我只有一个表,正在尝试使用Impala SQL获取所有用户共同拥有的destinationhostname。

代理表:

sourcehostname destinationhostname
comp1          google.com
comp2          google.com
comp1          yahoo.com
comp1          facebook.com
comp2          facebook.com
comp3          facebook.com

当我运行以下命令从2个源主机名上的一个表中获取不同的目标主机名时,此方法有效:

SELECT DISTINCT t1.destinationhostname
FROM proxy_table t1 JOIN proxy_table t2
  ON t1.destinationhostname = t2.destinationhostname AND t1.sourcehostname  ="comp1" AND t2.sourcehostname="comp2";

它返回:

google.comfacebook.com

我正在尝试返回comp1 comp2comp3都具有facebook.com共同点的值,但是我无法正确地获得此查询:

SELECT DISTINCT t1.destinationhostname
FROM proxy_table t1 JOIN proxy_table t2 JOIN proxy_table t3
  ON t1.destinationhostname = t2.destinationhostname AND t1.sourcehostname  ="comp1" AND t2.sourcehostname="comp2" t3.sourcehostname = "comp3";

在查询中,我想指定3台不同的计算机,因为它们有数千台,但我只想选择特定的计算机。

2 个答案:

答案 0 :(得分:1)

使用聚合。假设没有重复的行:

select destinationhostname
from proxy_table 
group by destinationhostname
having count(*) = (select count(distinct sourcehostname) from proxy_table);

如果可以有重复的行,只需更改having

having count(distinct sourcehostname) = (select count(distinct sourcehostname) from proxy_table);

如果您只需要三个用户,则只需使用= 3

答案 1 :(得分:1)

您可以尝试以下吗?

SELECT DISTINCT t1.destinationhostname
FROM proxy_table t1 JOIN proxy_table t2
ON t1.destinationhostname = t2.destinationhostname 
JOIN proxy_table t3
ON t1.destinationhostname = t3.destinationhostname 
and t2.destinationhostname = t3.destinationhostname 
WHERE
t1.sourcehostname  ="comp1" 
AND t2.sourcehostname="comp2"
AND t3.sourcehostname = "comp3";

如果您遇到问题,请告诉我