我正在寻找一种计算事件的智能方法。
以下是一个例子:
UserID CityID CountryID TagID
100000 1 30 5
100001 1 30 6
100000 2 20 7
100000 2 40 8
100001 1 40 6
100002 1 40 5
100002 1 20 6
我想做什么:
我想按列和每个用户计算值的出现次数。最后,我想要一个表格,向我展示有多少用户拥有的不仅仅是不同的特征。
结果应该是这样 - 或多或少
Different_CityID Different_CountryIDs Different_TagIDs
1 3 2
说明:
我为COUNTs争取了列和GROUP BYs,但最终它没有成功。有智能解决方案吗?
非常感谢
答案 0 :(得分:1)
select count(case when pos=0 and count_distinct_ID>1 then 1 end) as different_cityid
,count(case when pos=1 and count_distinct_ID>1 then 1 end) as different_countryid
,count(case when pos=2 and count_distinct_ID>1 then 1 end) as different_tagid
from (select pe.pos
,count (distinct pe.ID) as count_distinct_ID
from mytable t
lateral view posexplode (array(CityID,CountryID,TagID)) pe as pos,ID
group by t.UserID
,pe.pos
) t
;
+------------------+---------------------+-----------------+
| different_cityid | different_countryid | different_tagid |
+------------------+---------------------+-----------------+
| 1 | 3 | 2 |
+------------------+---------------------+-----------------+
这是避免count(distinct ...)
select count (case when pos=0 and not is_distinct_ID then 1 end) as different_cityid
,count (case when pos=1 and not is_distinct_ID then 1 end) as different_countryid
,count (case when pos=2 and not is_distinct_ID then 1 end) as different_tagid
from (select pe.pos
,min(pe.ID)<=>max(pe.ID) as is_distinct_ID
from mytable t
lateral view posexplode (array(CityID,CountryID,TagID)) pe as pos,ID
group by t.UserID
,pe.pos
) t
;
......和另一种变体
select count (case when not is_distinct_CityID then 1 end) as different_cityid
,count (case when not is_distinct_CountryID then 1 end) as different_countryid
,count (case when not is_distinct_TagID then 1 end) as different_tagid
from (select min (CityID) <=> max (CityID) as is_distinct_CityID
,min (CountryID) <=> max (CountryID) as is_distinct_CountryID
,min (TagID) <=> max (TagID) as is_distinct_TagID
from mytable
group by UserID
) t
;
答案 1 :(得分:1)
使用以下代码,我认为它对您有帮助,
SELECT COUNT(DISTINCT (CountryID)) AS CountryID,
COUNT(DISTINCT(CityID)) AS CityID,
COUNT(DISTINCT(TagID)) AS TagID
FROM test GROUP BY UserID
结果将是这样的,
CountryID CityID TagID
2 3 3
1 2 1
1 2 2
此致 Vinu
答案 2 :(得分:1)
select uid,cid,count(c),count(g) from(select cid,uid,count(coid) over(partition by cid,uid) as c,count(tagid) over(partition by cid,tagid) as g from citydata)e group by cid,uid;
此处uid = userid,cid = cityid,coid = countryid,tagid
Total MapReduce CPU Time Spent: 0 msec
OK
uid cid coid tagid
100000 1 1 1
100001 1 2 2
100002 1 2 2
100000 2 2 2
Time taken: 3.865 seconds, Fetched: 4 row(s)
基于userid
我希望这会有所帮助