Question

我基于retweet_count在蜂巢中找到前10条热门推文即具有最高retweet_count的推文将是第一个等等......

这是选举表详情

id                      bigint                  from deserializer   
created_at              string                  from deserializer   
source                  string                  from deserializer   
favorited               boolean                 from deserializer   
retweeted_status        struct<text:string,user:struct<screen_name:string,name:string>,retweet_count:int>   from deserializer   
entities                struct<urls:array<struct<expanded_url:string>>,user_mentions:array<struct<screen_name:string,name:string>>,hashtags:array<struct<text:string>>> from deserializer   
text                    string                  from deserializer   
user                    struct<screen_name:string,name:string,friends_count:int,followers_count:int,statuses_count:int,verified:boolean,utc_offset:int,time_zone:string,location:string>    from deserializer   
in_reply_to_screen_name string                  from deserializer

我的查询

select text 
from election 
where retweeted_status.retweet_count IN  
     (select  retweeted_status.retweet_count as zz 
      from election  
      order by zz desc  
      limit 10);

它回复了我10次相同的推文。（TWEET-ABC， TWEET-ABC， TWEET-ABC，。。。 TWEET-ABC）

所以当我运行内部查询

时，我所做的是打破嵌套查询

select  retweeted_status.retweet_count as zz 
from election  
order by zz desc  
limit 10

它返回10个不同的值（1210,1209,1208,1207,1206，...... 1201）

之后我运行外部查询

select text 
from election  
where retweeted_status.retweet_count 
      IN  (1210,1209,1208,1207,1206,....1201 );

结果相同10条推文（TWEET-ABC， TWEET-ABC， TWEET-ABC，。。。 TWEET-ABC）

我的查询逻辑有什么问题？

Answer 1

您应该使用id，而不是使用count。那是因为如果你有100条相同数量的推文并不重要LIMIT 10你将获得100条记录。

select text 
from election 
where id  IN  
     (select  id as zz 
      from election  
      order by retweeted_status.retweet_count desc  
      limit 10);

但仍不确定为什么会得到错误的结果。

编辑（在我的评论之后）：

如果我的评论是正确的，那么您将拥有十次相同的ID。在那种情况下改为

     (select distinct id as zz 
      from election  
      order by retweeted_status.retweet_count desc  
      limit 10);

查找Hive中排名前10的热门推文

1 个答案: