使用php根据文本中最重复的单词自动生成标题

时间:2013-07-02 19:06:10

标签: php

我想基于使用PHP的文本中最重复的单词自动生成标题。 示例:如果单词“PHP”在文本中重复最多标题将是:“文本是关于PHP”....等等。 我不知道该做什么或从哪里开始。

任何人都可以帮我吗?

3 个答案:

答案 0 :(得分:3)

如果我必须为您完成家庭作业,我需要在论文中完整归属,并在论文中提供此问题的链接。

我还要求您实际阅读,理解并尝试运行此代码以使您能够理解它。

//get all the test from the file
$text_from_file = file_get_contents("filename.txt");

//get all the words within that text
$words = str_word_count($text_from_file , 1);

//count up all the unique words within the array
$unique = array_count_values($words);

//sort by most to least frequent
arsort($unique); //arsort required to keep keys and values together

//since we dont know the key values here, we need to use foreach
foreach($unique as $key => $val) {
  echo("The most common word is " . $key . " which occurs " . $val . " times");

  break; //always break after the first echo
}

答案 1 :(得分:1)

<?php
function mostRepeated($string = false, $words_num = 5) {
    $string = strtolower($string);
    // extend this array
    $omit_words = array('the', 'a', 'an', 'in', 'at', 'by', 'of', 'was', 'is', 'he', 'she');

    $words = explode(' ', $string);
    foreach($words as $k => $v) {
        if(in_array($word, $omit_words)) unset($words[$k]);
    }

    $count = array_count_values($words);
    arsort($count);
    $result = array();
    foreach($count as $k => $v) {
       $result[] = $k;
    }

    return $result;
}

$text = 'PHP foo Bar php foO pHp';
$most_repeated_words_array = mostRepeated($text, 3);
print_r($most_repeated_words_array);
?>

输出:

    Array
(
    [0] => php
    [1] => foo
    [2] => bar
)

答案 2 :(得分:0)

使用

print_r( array_count_values(str_word_count($text, 1)) );

将为您提供所有单词的计数。然后,您可以在排序时选择最顶层的?

rsort

将为您提供从高到低的排序数组