PHP - 在文章中查找关键字集

时间:2014-03-12 12:28:58

标签: php

这个概念是我有一系列关键词和文章。我想知道什么是最好的方法来确定这些关键字是否存在于文章集中,同时考虑到性能和速度。

基本上,关键字由3个或更多单词组成,但不超过10个单词。它将查看文章中是否存在关键字,然后它将仅返回文章中找到的关键字。

假设我们有一篇文章:

$articles = "Maybe it’s less true than it used to be that people are made of 
       place--that the same elements that form coal and clay and bogs and ice form 
       faces, voices and characters. I wrote my first collection of short stories, 
       The Bostons, in homage to this book, hoping, as did Joyce’s young Stephen 
       Dedalus, to encounter for the millionth time the reality of experience and to 
       forge in the smithy of my soul the uncreated conscience of some island-dwellers
       I knew." 

关键词:

$keywords = "less true than, people are made, smithy of my soul, uncreated 
             conscience, this is a test string"

out put mus be:

"less true than, people are made, smithy of my soul, uncreated conscience"

我已经使用

编程了
  $articles = mb_split( ' +', $articles );
  foreach ( $articles as $key => $word )
 $articles [$key] = trim($word);

  //Search for keywords     
  $keywords = str_replace(' ', '', $keywords);
  $keywords =  mb_split( '[ ,]+', mb_strtolower( $keywords, 'utf-8' ) );

  $result = implode(',', array_intersect($keywords, $articles );

但它只适用于每个关键字。我不知道如何通过多个关键字来做到这一点。

4 个答案:

答案 0 :(得分:0)

strpos()就是您所需要的。这有效 -

$res = Array();
foreach(explode(", ",$keywords) as $keyword){
    if(strpos($articles, $keyword)){
        $res[] = $keyword;
    }
}
$matched = implode($res,", ");
var_dump($matched);
/** OUTPUT **/
string 'less true than, people are made, smithy of my soul, uncreated conscience' (length=72)

答案 1 :(得分:0)

Regular Expressions可以帮到你。 这可行,如您所见[{3}}。 您的问题可能是关键字字符串中断?

$articles = "Maybe it’s less true than it used to be that people are made of 
   place--that the same elements that form coal and clay and bogs and ice form 
   faces, voices and characters. I wrote my first collection of short stories, 
   The Bostons, in homage to this book, hoping, as did Joyce’s young Stephen 
   Dedalus, to encounter for the millionth time the reality of experience and to 
   forge in the smithy of my soul the uncreated conscience of some island-dwellers
   I knew.";

$keywords = "less true than, people are made, smithy of my soul, uncreated conscience, this is a test string";

$keywordsArray = explode(', ',$keywords);

$pattern = '/'.implode('|',$keywordsArray).'/';
preg_match_all($pattern,$articles,$matches);

var_dump($matches);

答案 2 :(得分:0)

$matches = array_unique(
    preg_match_all(
        '/'.implode('|', explode(', ', $keywords).'/',
        $articles
    )
);

答案 3 :(得分:0)

$ articles ="也许它不像过去人们那样真实    地方 - 形成煤和粘土,沼泽和冰的相同元素形成    面孔,声音和人物。我写了我的第一个短篇小说集,    为了向这本书致敬,博斯顿希望像乔伊斯的年轻斯蒂芬一样    迪达勒斯,为了第一百万次遇到现实经验而来    在我灵魂的铁匠铺中锻造了一些岛上居民的不良创造    我知道。" ;

$ keywords ="不是真实,人是制造的,我灵魂的铁匠铺,没有创造的良心,这是一个测试字符串&#34 ;;

$ keyword = explode(',',$ keywords);

foreach($ keyword AS $ key => $ value){

if(strpos($articles,$value)) {

      $finalstring  .= $value.',';
 }   
}

echo $ finalstring;