我需要MYSQL搜索查询才能从我的表中获取热门话题,下面是我需要的解释
+----+---------+-----------------------------+
| ID | ID_user | text |
+----+---------+-----------------------------+
| 1 | bruno | michael jackson is dead |
| 2 | thomasi | michael j. moonwalk is dead |
| 3 | userts | michael jackson lives |
+----+---------+-----------------------------+
我想查询表中最重复的单词,限制前10名,结果可能是这样:
+-------+------------+
| count | word |
+-------+------------+
| 3 | michael |
| 2 | dead |
| 2 | jackson |
| 1 | j. |
| 1 | lives |
| 1 | moonwalk |
+-------+------------+
但是我想只搜索重复10次以上的单词,在这种情况下会出现没有单词,但如果重复单词的标准是2,它将只显示'michael'和'dead',但忽略'is'因为我不想要用较少的2个字符长度的单词,以及一个短语的单词,然后我需要apear这个:
+-------+-----------------+
| count | word |
+-------+-----------------+
| 2 | michael jackson |
| 2 | dead |
+-------+-----------------+
答案 0 :(得分:1)
CREATE TEMPORARY TABLE counters (id INT);
-- insert into counters as much as you like (values here means "number of repeats"
INSERT INTO counters VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9),(10),
(11),(12),(13),(14),(15),(16),(17),(18),(19),(20),
(21),(22),(23),(24),(25),(26),(27),(28),(29),(30);
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(texts.text,' ',counters.id),' ',-1) AS word,
COUNT(counters.id) AS counter
FROM texts
INNER JOIN counters ON (LENGTH(text)>0 AND SUBSTRING_INDEX(SUBSTRING_INDEX(text,' ',counters.id),' ',-1) <> SUBSTRING_INDEX(SUBSTRING_INDEX(text,' ',counters.id-1),' ', -1))
WHERE length(SUBSTRING_INDEX(SUBSTRING_INDEX(texts.text,' ',counters.id),' ',-1)) > 2
GROUP BY word
HAVING COUNT(counters.id) > 1
ORDER BY counter desc;
但效率不高,不应该那么做
编辑:
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(texts.text,' ',counters.id),' ',-1) AS word,
COUNT(counters.id) AS counter
FROM texts
INNER JOIN counters ON (LENGTH(text)>0 AND SUBSTRING_INDEX(SUBSTRING_INDEX(text,' ',counters.id),' ',-1) <> SUBSTRING_INDEX(SUBSTRING_INDEX(text,' ',counters.id-1),' ', -1))
-- exclude words list
WHERE SUBSTRING_INDEX(SUBSTRING_INDEX(texts.text,' ',counters.id),' ',-1) NOT IN ('is', 'of', 'this', 'to')
GROUP BY word
HAVING COUNT(counters.id) > 1
ORDER BY counter desc;