我想仅通过使用MySQL(如果可能)来查找列中的每个单词频率。例如:
表格:
id message
1 I want to eat pizza
2 I wanted chocolates
3 He doesn't like me
查询:???
结果:
词频
I 2
want 1
to 1
eat 1
pizza 1
wanted 1
等等。
有可能吗?如果是这样,请帮忙,谢谢
答案 0 :(得分:0)
您需要拆分数据。这很痛苦:
select substring_index(substring_index(message, ' ', n.n), ' ', -1) as word,
count(*)
from (select 1 as n union all select 2 union all select 3 union all
select 4 union all select 5
) n join
t
on n.n <= 1 + length(message) - length(replace(message, ' ', ''))
group by word;
以上假设所有消息的长度均在5个字以内。您可以在第一个子查询中扩展数字以获取更长的消息。
答案 1 :(得分:0)
这是一个php示例。您可能需要稍微调整一下。
让我们假设您有一个word_frequency表,其中包含唯一的列word
和一个count
的整数。另外,这很容易受到SQL注入的影响,因此您应该小心。但这应该可以帮助您入门。
<?php
$con=mysqli_connect("localhost","my_user","my_password","my_db");
if (mysqli_connect_errno())
{
echo "Failed to connect to MySQL: " . mysqli_connect_error();
}
$results = mysqli_query($con,"SELECT message FROM table1");
while($row = $results->fetch_assoc()) {
$words = explode(" ", $row['message']);
foreach ($words as $word) {
mysqli_query($con,"INSERT INTO word_frequency (`word`,`count`) VALUES ('$word',1) ON DUPLICATE KEY UPDATE `count`=`count`+1;");
}
}
mysqli_close($con);