Question

我正在寻找一种计算事件的智能方法。

以下是一个例子：

 UserID     CityID    CountryID   TagID
 100000      1         30        5
 100001      1         30        6
 100000      2         20        7
 100000      2         40        8
 100001      1         40        6
 100002      1         40        5
 100002      1         20        6

我想做什么：

我想按列和每个用户计算值的出现次数。最后，我想要一个表格，向我展示有多少用户拥有的不仅仅是不同的特征。

结果应该是这样 - 或多或少

Different_CityID    Different_CountryIDs   Different_TagIDs
1                   3                      2

说明：

Different_CityIDs：只是UserID 100000具有不同的CityID
Different_CountryIDs：所有用户的国家/地区ID都不同
Different_TagIDs：UserID 100000和100002都有不同的TagID。用户100001只有“6”作为TagID。

我为COUNTs争取了列和GROUP BYs，但最终它没有成功。有智能解决方案吗？

非常感谢

Answer 1

select  count(case when pos=0 and count_distinct_ID>1 then 1 end) as different_cityid
       ,count(case when pos=1 and count_distinct_ID>1 then 1 end) as different_countryid
       ,count(case when pos=2 and count_distinct_ID>1 then 1 end) as different_tagid

from   (select      pe.pos
                   ,count (distinct pe.ID) as count_distinct_ID
        from        mytable t
                    lateral view posexplode (array(CityID,CountryID,TagID)) pe as pos,ID

        group by    t.UserID
                   ,pe.pos        
        ) t          
;

+------------------+---------------------+-----------------+
| different_cityid | different_countryid | different_tagid |
+------------------+---------------------+-----------------+
|                1 |                   3 |               2 |
+------------------+---------------------+-----------------+

这是避免count(distinct ...)

的另一种变体

select  count (case when pos=0 and not is_distinct_ID then 1 end)  as different_cityid
       ,count (case when pos=1 and not is_distinct_ID then 1 end)  as different_countryid
       ,count (case when pos=2 and not is_distinct_ID then 1 end)  as different_tagid

from   (select      pe.pos
                   ,min(pe.ID)<=>max(pe.ID)  as is_distinct_ID
        from        mytable t
                    lateral view posexplode (array(CityID,CountryID,TagID)) pe as pos,ID

        group by    t.UserID
                   ,pe.pos        
        ) t          
;

......和另一种变体

select  count (case when not is_distinct_CityID    then 1 end)   as different_cityid
       ,count (case when not is_distinct_CountryID then 1 end)   as different_countryid
       ,count (case when not is_distinct_TagID     then 1 end)   as different_tagid

from   (select      min (CityID)    <=> max (CityID)     as is_distinct_CityID
                   ,min (CountryID) <=> max (CountryID)  as is_distinct_CountryID
                   ,min (TagID)     <=> max (TagID)      as is_distinct_TagID

        from        mytable

        group by    UserID     
        ) t          
;

Answer 2

使用以下代码，我认为它对您有帮助，

SELECT COUNT(DISTINCT (CountryID)) AS CountryID,
COUNT(DISTINCT(CityID)) AS CityID,
COUNT(DISTINCT(TagID)) AS TagID
FROM test GROUP BY UserID

结果将是这样的，

CountryID   CityID  TagID
2   3   3
1   2   1
1   2   2

此致 Vinu

Answer 3

select uid,cid,count(c),count(g) from(select cid,uid,count(coid) over(partition by cid,uid) as c,count(tagid) over(partition by cid,tagid) as g from citydata)e group by cid,uid;

此处uid = userid，cid = cityid，coid = countryid，tagid

Total MapReduce CPU Time Spent: 0 msec OK uid cid coid tagid 100000 1 1 1 100001 1 2 2 100002 1 2 2 100000 2 2 2 Time taken: 3.865 seconds, Fetched: 4 row(s)

基于userid我希望这会有所帮助

多个列和行的Hive / SQL计数出现次数

3 个答案: