http://sqlfiddle.com/#!2/e6382
id id_news word
1 6 superman
2 6 movie
3 6 review
4 6 excellent
5 7 review
6 7 guardians of the galaxy
7 7 great
8 8 review
9 8 superman
10 8 movie
11 8 great
我遇到了一个小问题,我尝试通过带有阈值设置的字词来处理不同的新闻,在提供的例子中id_news 6
应该与8
相关但不是{{1}因为7
只有7
个单词,所以我只想检测那些至少有2
个单词的人。
答案 0 :(得分:4)
这将使您接近所需:
SELECT wa1.id_news id, wa2.id_news related
FROM word_analysis wa1
JOIN word_analysis wa2
ON wa2.id_news != wa1.id_news
AND wa2.word = wa1.word
GROUP BY wa1.id_news, wa2.id_news
HAVING COUNT(*)>2
ORDER BY wa1.id_news, wa2.id_news
如果你不想要反向关系:
SELECT wa1.id_news id, wa2.id_news related
FROM word_analysis wa1
JOIN word_analysis wa2
ON wa2.id_news > wa1.id_news
AND wa2.word = wa1.word
GROUP BY wa1.id_news, wa2.id_news
HAVING COUNT(*)>2
ORDER BY wa1.id_news, wa2.id_news
如果您只想调查一个wa1.id_news
(6):
SELECT wa2.id_news related
FROM word_analysis wa1
JOIN word_analysis wa2
ON wa2.id_news != wa1.id_news
AND wa2.word = wa1.word
WHERE wa1.id_news = 6
GROUP BY wa1.id_news, wa2.id_news
HAVING COUNT(*)>2
ORDER BY wa2.id_news
如果你只想调查一个关系(6-> 8),结果意味着相关,没有结果意味着无关:
SELECT 1
FROM word_analysis wa1
JOIN word_analysis wa2
ON wa2.id_news = 8
AND wa2.word = wa1.word
WHERE wa1.id_news = 6
GROUP BY wa1.id_news, wa2.id_news
HAVING COUNT(*)>2
答案 1 :(得分:3)
试试这个自我加入:
SELECT
wa1.id_news id_news_1,
wa2.id_news id_news_2,
count(wa2.word) cnt_words
FROM word_analysis wa1
INNER JOIN word_analysis wa2
ON wa1.id_news <> wa2.id_news AND wa1.word = wa2.word
GROUP BY wa1.id_news, wa2.id_news
HAVING count(wa2.word) >= 3
ORDER BY wa1.id_news, wa2.id_news;