MySQL,查找列中的每个字符串/单词频率

时间:2018-06-26 02:12:50

标签: mysql sql

我想仅通过使用MySQL(如果可能)来查找列中的每个单词频率。例如:

表格:

id message
1  I want to eat pizza
2  I wanted chocolates
3  He doesn't like me

查询:???

结果:

词频

I   2 
want 1
to 1
eat 1
pizza 1
wanted 1

等等。

有可能吗?如果是这样,请帮忙,谢谢

2 个答案:

答案 0 :(得分:0)

您需要拆分数据。这很痛苦:

select substring_index(substring_index(message, ' ', n.n), ' ', -1) as word,
       count(*) 
from (select 1 as n union all select 2 union all select 3 union all
      select 4 union all select 5
     ) n join
     t
     on n.n <= 1 + length(message) - length(replace(message, ' ', ''))
group by word;

以上假设所有消息的长度均在5个字以内。您可以在第一个子查询中扩展数字以获取更长的消息。

答案 1 :(得分:0)

这是一个php示例。您可能需要稍微调整一下。

让我们假设您有一个word_frequency表,其中包含唯一的列word和一个count的整数。另外,这很容易受到SQL注入的影响,因此您应该小心。但这应该可以帮助您入门。

<?php
$con=mysqli_connect("localhost","my_user","my_password","my_db");
if (mysqli_connect_errno())
  {
  echo "Failed to connect to MySQL: " . mysqli_connect_error();
  }

$results = mysqli_query($con,"SELECT message FROM table1");
while($row = $results->fetch_assoc()) {
   $words = explode(" ", $row['message']);
   foreach ($words as $word) {
      mysqli_query($con,"INSERT INTO word_frequency (`word`,`count`) VALUES ('$word',1) ON DUPLICATE KEY UPDATE `count`=`count`+1;");
   }
}

mysqli_close($con);