我有一个名为“content”的MySQL表,其中包含(a.o.)字段“_date”和“text”,例如:
_date text
---------------------------------------------------------
2011-02-18 I'm afraid my car won't start tomorrow
2011-02-18 I hope I'm going to pass my exams
2011-02-18 Exams coming up - I'm not afraid :P
2011-02-19 Not a single f was given this day
2011-02-20 I still hope I passed, but I'm afraid I didn't
2011-02-20 On my way to school :)
我正在寻找一个查询来计算每天使用“希望”和“害怕”字样的次数。换句话说,输出必须是:
_date word count
-----------------------
2011-02-18 hope 1
2011-02-18 afraid 2
2011-02-19 hope 0
2011-02-19 afraid 0
2011-02-20 hope 1
2011-02-20 afraid 1
有没有一种简单的方法可以做到这一点,或者我应该在每个学期写下不同的查询?我现在有这个,但我不知道该放什么而不是“?”
SELECT COUNT(?) FROM content WHERE text LIKE '%hope' GROUP BY _date
有人可以帮忙解决这个问题吗?
答案 0 :(得分:3)
我认为最简单易行的方法是制作subquerys:
Select
_date, 'hope' as word,
sum( case when `text` like '%hope%' then 1 else 0 end) as n
from content
group by _date
UNION
Select
_date, 'afraid' as word,
sum( case when `text` like '%afraid%' then 1 else 0 end) as n
from content
group by _date
这种方法效果不佳。如果您正在寻找性能,那么您应该在白天对子查询进行分组,此like
条件也是性能杀手。如果您只是按批处理模式执行查询,那么这是一个解决方案。解释您的性能要求以获得准确的解决方案。
编辑以匹配最后的OP要求
答案 1 :(得分:2)
您的查询几乎正确无误:
SELECT _date, 'hope' AS word, COUNT(*) as count
FROM content WHERE text LIKE '%hope%' GROUP BY _date
使用%hope%
匹配任何地方的单词(不仅仅是在字符串的末尾)。 COUNT(*)
应该做你想做的事。
要从单个查询中获取多个字词,请使用UNION ALL
另一种方法是动态创建一系列单词并将其用作连接中的第二个表:
SELECT _date, words.word, COUNT(*) as count
FROM (
SELECT 'hope' AS word
UNION
SELECT 'afraid' AS word
) AS words
CROSS JOIN content
WHERE text LIKE CONCAT('%', words.word, '%')
GROUP BY _date, words.word
请注意,每个句子只会计算每个单词的一次出现次数。所以»我希望仍有希望«只会给你1
,而不是2
要在没有匹配项时获取0
,请再次将上一个结果与日期相结合:
SELECT content._date, COALESCE(result.word, 'no match'), COALESCE(result.count, 0)
FROM content
LEFT JOIN (
SELECT _date, words.word, COUNT(*) as count
FROM (
SELECT 'hope' AS word
UNION
SELECT 'afraid' AS word
) AS words
CROSS JOIN content
WHERE text LIKE CONCAT('%', words.word, '%')
GROUP BY _date, words.word ) AS result
ON content._date = result._date
答案 2 :(得分:2)
假设您想要计算所有单词并找到最常用的单词(而不是查找几个特定单词的计数),您可能需要尝试类似下面的存储过程(this blog post的字符串拆分补语):
DROP PROCEDURE IF EXISTS wordsUsed;
DELIMITER //
CREATE PROCEDURE wordsUsed ()
BEGIN
DROP TEMPORARY TABLE IF EXISTS wordTmp;
CREATE TEMPORARY TABLE wordTmp (word VARCHAR(255));
SET @wordCt = 0;
SET @tokenCt = 1;
contentLoop: LOOP
SET @stmt = 'INSERT INTO wordTmp SELECT REPLACE(SUBSTRING(SUBSTRING_INDEX(`text`, " ", ?),
LENGTH(SUBSTRING_INDEX(`text`, " ", ? -1)) + 1),
" ", "") word
FROM content
WHERE LENGTH(SUBSTRING_INDEX(`text`, " ", ?)) != LENGTH(`text`)';
PREPARE cmd FROM @stmt;
EXECUTE cmd USING @tokenCt, @tokenCt, @tokenCt;
SELECT ROW_COUNT() INTO @wordCt;
DEALLOCATE PREPARE cmd;
IF (@wordCt = 0) THEN
LEAVE contentLoop;
ELSE
SET @tokenCt = @tokenCt + 1;
END IF;
END LOOP;
SELECT word, count(*) usageCount FROM wordTmp GROUP BY word ORDER BY usageCount DESC;
END //
DELIMITER ;
CALL wordsUsed();
您可能想要编写另一个查询(或过程)或添加一些嵌套的“REPLACE”语句,以进一步从生成的临时表中删除标点符号,但这应该是一个好的开始。