我试图计算每个c的唯一列b,而没有进行分组依据。我知道可以通过join来完成。如何在不诉诸加入的情况下计算(不同的b)(由c划分)。为什么窗口函数不支持计数不同。先感谢您。 给定此数据框:
val df= Seq(("a1","b1","c1"),
("a2","b2","c1"),
("a3","b3","c1"),
("a31",null,"c1"),
("a32",null,"c1"),
("a4","b4","c11"),
("a5","b5","c11"),
("a6","b6","c11"),
("a7","b1","c2"),
("a8","b1","c3"),
("a9","b1","c4"),
("a91","b1","c5"),
("a92","b1","c5"),
("a93","b1","c5"),
("a95","b2","c6"),
("a96","b2","c6"),
("a97","b1","c6"),
("a977",null,"c6"),
("a98",null,"c8"),
("a99",null,"c8"),
("a999",null,"c8")
).toDF("a","b","c");
答案 0 :(得分:0)
某些数据库确实支持count(distinct)
作为窗口函数。
有两种选择。一种是密集等级的总和:
select (dense_rank() over (partition by c order by b asc) +
dense_rank() over (partition by c order by b desc) -
1
) as count_distinct
from t;
第二个使用子查询:
select sum(case when seqnum = 1 then 1 else 0 end) over (partition by c)
from (select t.*, row_number() over (partition by c order by b) as seqnum
from t
) t;
答案 1 :(得分:0)
每个c的唯一列b的数量,不进行分组依据。
典型的SQL解决方法是使用子查询来选择非重复元组,然后在外部查询中选择窗口计数:
SELECT c, COUNT(*) OVER(PARTITION BY c) cnt
FROM (SELECT DISTINCT b, c FROM mytable) x