INNER JOIN在计算单词出现的同一桌子上

时间:2014-08-13 13:09:04

标签: mysql inner-join

http://sqlfiddle.com/#!2/e6382

id   id_news  word
 1    6       superman
 2    6       movie
 3    6       review
 4    6       excellent
 5    7       review
 6    7       guardians of the galaxy
 7    7       great
 8    8       review
 9    8       superman
10    8       movie
11    8       great

我遇到了一个小问题,我尝试通过带有阈值设置的字词来处理不同的新闻,在提供的例子中id_news 6应该与8相关但不是{{1}因为7只有7个单词,所以我只想检测那些至少有2个单词的人。

2 个答案:

答案 0 :(得分:4)

这将使您接近所需:

  SELECT wa1.id_news id, wa2.id_news related
    FROM word_analysis wa1
    JOIN word_analysis wa2
      ON wa2.id_news != wa1.id_news
     AND wa2.word = wa1.word
GROUP BY wa1.id_news, wa2.id_news
  HAVING COUNT(*)>2
ORDER BY wa1.id_news, wa2.id_news

如果你不想要反向关系:

  SELECT wa1.id_news id, wa2.id_news related
    FROM word_analysis wa1
    JOIN word_analysis wa2
      ON wa2.id_news > wa1.id_news
     AND wa2.word = wa1.word
GROUP BY wa1.id_news, wa2.id_news
  HAVING COUNT(*)>2
ORDER BY wa1.id_news, wa2.id_news

如果您只想调查一个wa1.id_news(6):

  SELECT wa2.id_news related
    FROM word_analysis wa1
    JOIN word_analysis wa2
      ON wa2.id_news != wa1.id_news
     AND wa2.word = wa1.word
   WHERE wa1.id_news = 6
GROUP BY wa1.id_news, wa2.id_news
  HAVING COUNT(*)>2
ORDER BY wa2.id_news

如果你只想调查一个关系(6-> 8),结果意味着相关,没有结果意味着无关:

  SELECT 1
    FROM word_analysis wa1
    JOIN word_analysis wa2
      ON wa2.id_news = 8
     AND wa2.word = wa1.word
   WHERE wa1.id_news = 6
GROUP BY wa1.id_news, wa2.id_news
  HAVING COUNT(*)>2

答案 1 :(得分:3)

试试这个自我加入:

SELECT
  wa1.id_news id_news_1,
  wa2.id_news id_news_2,
  count(wa2.word) cnt_words
FROM word_analysis wa1
INNER JOIN word_analysis wa2
ON wa1.id_news <> wa2.id_news AND wa1.word = wa2.word
GROUP BY wa1.id_news, wa2.id_news
HAVING count(wa2.word) >= 3
ORDER BY wa1.id_news, wa2.id_news;

SQL Fiddle demo