Question

我的问题很难找到一个好问题标题，所以如果你有一个更好的问题，请随时编辑！

目前我正在使用file_get_contents检索页面，然后我将删除所有javascript，将所有内容设置为小写并从中删除所有html标记。

在此之后，我正在制作一个包含每个单词的数组：

preg_match_all("/((?:\w'|\w|-)+)/", $contents, $words);

$frequency = array();

    foreach($words[0] as $word) {

        unset($words[$word]);

        // This is the filter out the 'common words'
        if(in_array($word, $common_words)) continue;

        if(isset($frequency[$word])) {
            $frequency[$word] += 1;
        } else {
            $frequency[$word] = 1;
        }
    }

但是这适用于单个单词，如果我要检索包含此文本的HTML页面：

'这是一个示例文本。这就是HTML文本看起来像'

这将导致以下使用我的代码：

this = 2 is = 2 a = 2 sample = 1 text = 2 what = 1 html = 1 can = 1 look = 1 like = 1

但是现在我想要的东西看起来很像，但有两个字。我怎么会这样做？使用相同的句子看起来应该是这样的：

this is = 2

我试图提供尽可能多的例子，尽可能清楚。

如果您需要任何澄清，请询问！

Answer 1

尝试使用str_word_count()和array_count_values()`:

$total_words = array_count_values(str_word_count('your_string', 1));
print_r($total_words);

获取更多帮助： - php: sort and count instances of words in a given string

检查是否多次提到单词

1 个答案: