Question

我通过Twitter API从twitter下载一些数据并将其保存到我的postgresql数据库。

我从推文中保存了各种信息，但现在我想知道在推文中一起使用的一些主题标签是多么容易。

我有表格：hashtag，tweet_has_hashtag和tweet。 tweet_has_hashtag适用于多对多关系，tweet和hashtag

之间的关系

运行中的SQL是：

  select h1.txt, 
         h2.txt, 
         count(th1.tweet_id)
    from hashtag h1,
         tweet_has_hashtag th1, 
         tweet_has_hashtag th2, 
         hashtag h2
   where th1.hashtag_id = h1.id and 
         th2.tweet_id = th1.tweet_id and 
         th2.hashtag_id = h2.id and 
         h2.id <> h1.id
group by h1.id, 
         h2.id
order by count(th1.tweet_id) desc
   limit 1000

结果很好，但是tha hashtags在不同的行中是相同的，但是切换了例如：

love    | me      | 925
me      | love    | 925
style   | fashion | 654
fashion | style   | 654

如何在没有切换重复的情况下获得结果？

Answer 1

用h2.id <> h1.id中的h2.id > h1.id代替WHERE。

  SELECT h1.txt, 
         h2.txt, 
         COUNT(th1.tweet_id)
    FROM hashtag h1,
         tweet_has_hashtag th1, 
         tweet_has_hashtag th2, 
         hashtag h2
   WHERE th1.hashtag_id=h1.id 
         AND th2.tweet_id=th1.tweet_id 
         AND th2.hashtag_id=h2.id 
         AND h2.id > h1.id
GROUP BY h1.id, 
         h2.id
ORDER BY COUNT(th1.tweet_id) DESC
   LIMIT 1000;

SQL结果中的重复项具有多对多关系

1 个答案: