Question

我有一个完全动态的关键字脚本的想法，这将允许我简单地为我的网站写内容/当人们在我的网站上发布并且从他们发布的内容自动生成关键字时......我已经探索过了这个方法如下，但我不确定如何继续前进。非常感谢任何帮助。

<?php
     $content = "everything inside the body of the page";
     $common = array(' a ', ' the ', ' I ');
     $replaced = str_replace($common, ' ', strip_tags($content));
     $array = str_word_count($replaced, 1);
     $count = array_count_values( $array );
?>

该代码从页面中获取内容，从中删除HTML标记，从所有内容创建一个数组，每个单词都有一个值，表示在页面中使用的次数。

如何为超过X次使用的单词过滤此数组？

编辑：感谢Jan提供他们的解决方案，对我需要做的事情非常有帮助，但最后稍微改变了一点（不要太讨厌我，但我把它合并为一行以节省空间）

if ( isset($page['content']) and $page['content'] != ' ' ) {
    foreach ( array_count_values(str_word_count(str_replace(array('nbsp', ' nbsp ', ' something ', ' that ', ' does ', 'that', ' that ', ' have ',' with', ' this ', ' from ', ' they ', ' will ', ' would ', ' there ', ' their ', ' what ', ' about ', ' which ', ' when ', ' make ', ' like ', ' time ', ' just ', ' know ', ' take ', ' person ', ' into ', ' year ', ' your ', ' good ', ' some ', ' could ', ' them ', ' other ', ' than ', ' then ', ' look ', ' only ', ' come ', ' over ', ' think ', ' also ', ' back ', ' after ', ' work ', ' first ', ' well ', ' even ', ' want ', ' because ', ' these ', ' give ', ' most '), ' ', strip_tags($page['content'])), 1)) as $keyword => $frequency ) {
        if ( $frequency >= '3' and strlen($keyword) >= '4' and strlen($keyword) <= '10' and strpos($keywords, $keyword) === false ) {
            $keywords .= strtolower($keyword).', ';
        }
    }
    echo '<meta name="keywords" content="'.trim($keywords, ", ").'"/>';
}

Answer 1

通过将单词添加到新数组来考虑过滤。每次要从旧数组添加一个单词时，检查新数组中是否已存在该单词，并使用if语句阻止它添加（如果已存在）

Answer 2

添加

 arsort($count);

并根据需要使用最多计数的密钥

Answer 3

你可以迭代数组并测试每个键的值，如果它足够高，那么将键作为值添加到新数组中。

$min_count = 1; // Number of times the word should be found inside the content to be considered as a keyword
$keywords = array();
foreach ( $count as $keyword => $value ) {
    if ( $value >= $min_count ) {
        $keywords[] = $keyword;
    }
}

$keywords现在拥有您感兴趣的词语。

Answer 4

    enterButton.addActionListener(new ActionListener() {

        @Override
        public void actionPerformed(ActionEvent e2) {
            log=textArea.getText();
            pass=textArea2.getText();
            Session session = factory.openSession();
            Transaction tx = null;
            try{
                tx = session.beginTransaction();
                List students = session.createQuery("FROM Student").list();
                for (Iterator iterator =
                             students.iterator(); iterator.hasNext();){
                    Student student = (Student) iterator.next();
                    if((student.getLogin().equals(log))&&(student.getPassword().equals(pass))){
                        ID=student.getId();//this should be returned
                        JOptionPane.showMessageDialog(null,"return="+ID);
                        break;
                    }
                }
                tx.commit();


            }catch (HibernateException e) {
                if (tx!=null) tx.rollback();
                e.printStackTrace();
            }finally {
                session.close();
            }
        }
    });

    return ID; //returns 0
}

示例输出：

<?php

// exclude words appearing more than this many times
$limit = 3;

// exclude these words
$wordsToExclude = array('a', 'the');

// the content
$content = "everything inside the body of the page a a a test test test test don't feed the elephants inside";

// better way of splitting into words - http://stackoverflow.com/questions/790596/split-a-text-into-single-words
$words = preg_split('/((^\p{P}+)|(\p{P}*\s+\p{P}*)|(\p{P}+$))/', $content, -1, PREG_SPLIT_NO_EMPTY);

// count how many times each word appears. this will create an array with words as the keys, and counts as the values
$uniqueWords = array_count_values($words);

foreach($uniqueWords as $word => $count)
{
    // remove excluded words, and words appearing more times than the limit
    if (in_array($word, $wordsToExclude) || $count > $limit) {
        unset($uniqueWords[$word]);
    }
}

var_dump($uniqueWords);

您可以使用所有单词（使用array (size=8) 'everything' => int 1 'inside' => int 2 'body' => int 1 'of' => int 1 'page' => int 1 'don't' => int 1 'feed' => int 1 'elephants' => int 1）或使用计数作为某种形式的加权。

使用PHP的完全动态关键字

4 个答案: