计算PHP中文本出现的频率

时间:2010-01-23 13:15:33

标签: php

在php中我需要加载一个文件并获取所有单词并回显单词以及每个单词在文本中显示的次数, (我还需要它们按降序显示最常用的单词)★✩

4 个答案:

答案 0 :(得分:11)

以下是一个例子:

$text = "A very nice únÌcÕdë text. Something nice to think about if you're into Unicode.";

// $words = str_word_count($text, 1); // use this function if you only want ASCII
$words = utf8_str_word_count($text, 1); // use this function if you care about i18n

$frequency = array_count_values($words);

arsort($frequency);

echo '<pre>';
print_r($frequency);
echo '</pre>';

输出:

Array
(
    [nice] => 2
    [if] => 1
    [about] => 1
    [you're] => 1
    [into] => 1
    [Unicode] => 1
    [think] => 1
    [to] => 1
    [very] => 1
    [únÌcÕdë] => 1
    [text] => 1
    [Something] => 1
    [A] => 1
)

utf8_str_word_count()功能,如果您需要它:

function utf8_str_word_count($string, $format = 0, $charlist = null)
{
    $result = array();

    if (preg_match_all('~[\p{L}\p{Mn}\p{Pd}\'\x{2019}' . preg_quote($charlist, '~') . ']+~u', $string, $result) > 0)
    {
        if (array_key_exists(0, $result) === true)
        {
            $result = $result[0];
        }
    }

    if ($format == 0)
    {
        $result = count($result);
    }

    return $result;
}

答案 1 :(得分:3)

$words = str_word_count($text, 1);
$word_frequencies = array_count_values($words);
arsort($word_frequencies);
print_r($word_frequencies);

答案 2 :(得分:2)

此函数使用正则表达式查找单词(您可能想要更改它,具体取决于您定义单词的内容)

function count_words($text)
{
    $output = $words = array();
    preg_match_all("/[A-Za-z'-]+/", $text, $words); // Find words in the text

    foreach ($words[0] as $word)
    {
        if (!array_key_exists($word, $output))
            $output[$word] = 0;

        $output[$word]++; // Every time we find this word, we add 1 to the count
    }

    return $output;
}

迭代每个单词,构造一个关联数组(以单词作为键),其中值指的是每个单词的出现。 (例如$ output ['hello'] = 3 =&gt; hello在文本中出现3次)。

也许您可能想要更改处理不区分大小写的函数(即'hello'和'Hello'不是同一个单词,根据此函数)。

答案 3 :(得分:0)

echo count(explode('your_word', $your_text));