我只有一个表,正在尝试使用Impala SQL获取所有用户共同拥有的destinationhostname。
代理表:
sourcehostname destinationhostname
comp1 google.com
comp2 google.com
comp1 yahoo.com
comp1 facebook.com
comp2 facebook.com
comp3 facebook.com
当我运行以下命令从2个源主机名上的一个表中获取不同的目标主机名时,此方法有效:
SELECT DISTINCT t1.destinationhostname
FROM proxy_table t1 JOIN proxy_table t2
ON t1.destinationhostname = t2.destinationhostname AND t1.sourcehostname ="comp1" AND t2.sourcehostname="comp2";
它返回:
google.com
和facebook.com
我正在尝试返回comp1
comp2
和comp3
都具有facebook.com
共同点的值,但是我无法正确地获得此查询:>
SELECT DISTINCT t1.destinationhostname
FROM proxy_table t1 JOIN proxy_table t2 JOIN proxy_table t3
ON t1.destinationhostname = t2.destinationhostname AND t1.sourcehostname ="comp1" AND t2.sourcehostname="comp2" t3.sourcehostname = "comp3";
在查询中,我想指定3台不同的计算机,因为它们有数千台,但我只想选择特定的计算机。
答案 0 :(得分:1)
使用聚合。假设没有重复的行:
select destinationhostname
from proxy_table
group by destinationhostname
having count(*) = (select count(distinct sourcehostname) from proxy_table);
如果可以有重复的行,只需更改having
:
having count(distinct sourcehostname) = (select count(distinct sourcehostname) from proxy_table);
如果您只需要三个用户,则只需使用= 3
。
答案 1 :(得分:1)
您可以尝试以下吗?
SELECT DISTINCT t1.destinationhostname
FROM proxy_table t1 JOIN proxy_table t2
ON t1.destinationhostname = t2.destinationhostname
JOIN proxy_table t3
ON t1.destinationhostname = t3.destinationhostname
and t2.destinationhostname = t3.destinationhostname
WHERE
t1.sourcehostname ="comp1"
AND t2.sourcehostname="comp2"
AND t3.sourcehostname = "comp3";
如果您遇到问题,请告诉我