我的hql是:
select day,app_id,platform,count(1) as dau from (select ad.day,ads.app_id,ads.platform,ad.deviceid from mobile_ad_space ads inner join (select day,deviceid,flightid from mobile_day_adlog where day='$STAT_DAY') ad on cast(ads.space_id as string)=ad.flightid group by ad.day,ads.app_id,ads.platform,ad.deviceid) day_active group by day,app_id,platform having dau>5;
但结果包含重复数据:
| day | appid | platform | dau|
| 2016-06-29 | 1 | ios | 70533 |
| 2016-06-29 | 1 | android | 49307 |
| 2016-06-29 | 1 | android | 49307 |
| 2016-06-29 | 1 | android | 49307 |
我也尝试了其他的hql:
SELECT day_active.day ,day_active.app_id ,day_active.platform ,count(1) AS dau FROM ( SELECT day_device.day AS day ,day_device.app_id AS app_id ,day_device.platform AS platform ,day_device.deviceid AS deviceid FROM (SELECT ad.day AS day ,ads.app_id AS app_id ,ads.platform AS platform ,ad.deviceid AS deviceid FROM mobile_ad_space ads INNER JOIN ( SELECT day ,deviceid ,flightid FROM mobile_day_adlog WHERE day = '$STAT_DAY' ) ad ON cast(ads.space_id AS string) = ad.flightid) day_device GROUP BY day_device.day ,day_device.app_id ,day_device.platform ,day_device.deviceid ) day_active GROUP BY day_active.day ,day_active.app_id ,day_active.platform HAVING dau>5 ORDER BY day_active.day ,day_active.app_id ,day_active.platform;
但它仍有这个问题
有谁可以帮助我?
答案 0 :(得分:0)
尝试检查值是否包含空格,在group by中使用trim()。还要按列检查每个组的不同值的数量:select distinct platform ...
,依此类推
答案 1 :(得分:0)
最后,我找到了解决方案。
在我的HQL中,我使用insert overwrite directory '$HQL_OUT_PATH'
覆盖输出。但它似乎不稳定导致重复项。因此,我在HQL之前清理输出路径,结果现在变成了〜 !