字符串中的前10个关键字PHP

时间:2016-07-11 11:35:16

标签: php arrays string keyword

当我的目标是在字符串中显示前10个单词时,我制作了一个复杂的关键字数组。

b)我只想介绍一个重要的词,而不是像#34; The,That,to,a ..."。

完整代码:

$str= $db_tag;
    $tok = strtok($str, ", ");
    $subStrStart = 0;

    while ($tok !== false) {
        preg_match_all("/\b" . preg_quote($tok, "/") . "\b/", substr($str, $subStrStart), $m);
        if(count($m[0]) >= 10)
            echo "'" . $tok . "' found more than 10 times, exaclty: " . count($m[0]) . "<br>";
        $subStrStart += strlen($tok);
        $tok = strtok(", ");
    }    

我的字符串:

$db_tag="The,Economy,Could,Be,Given,A,Post,Brexit,Vote,Vote,Vote,Vote,Boost,This,Week,As,Expectations,Mount,That,The,Bank,Bank,Bank,Bank,Bank,Of,England,England,England,England,England,Will,Cut,Economy,Economy,Economy,Brexit,Brexit,Brexit,Brexit";

提前致谢。

4 个答案:

答案 0 :(得分:2)

试试这个:

$db_tag = "The,Economy,Could,Be,Given,A,Post,Brexit,Vote,Vote,Vote,Vote,Boost,This,Week,As,Expectations,Mount,That,The,Bank,Bank,Bank,Bank,Bank,Of,England,England,England,England,England,Will,Cut,Economy,Economy,Economy,Brexit,Brexit,Brexit,Brexit";

$stopWords = array(
    "the", "to", "in", "a", "of", "is", "that", "will", "and", "be"
);

// Convert to array and filter out stopwords.
$words = array_filter(function ($value) {
    return !in_array($value, $stopwords);
}, explode(',', $db_tag));

$counts = array_count_values($words);
asort($counts);
$topTen = array_reverse(array_slice($counts, -10, null, true));

var_dump($topTen);

您应该看到:

php > var_dump($topTen);
array(10) {
  ["England"]=>
  int(5)
  ["Bank"]=>
  int(5)
  ["Brexit"]=>
  int(5)
  ["Economy"]=>
  int(4)
  ["Vote"]=>
  int(4)
  ["The"]=>
  int(2)
  ["Post"]=>
  int(1)
  ["Given"]=>
  int(1)
  ["A"]=>
  int(1)
  ["Could"]=>
  int(1)
}

首先,我们将字符串拆分为一个explode()的数组。然后,我们返回一个带array_count_values()的唯一数组值数组,并与字符串中出现的计数相关联。

接下来,我们使用asort()按值对数组进行排序。然后,我们用array_slice()将数组中的最后10个元素(最高的元素)切片,然后用array_reverse()将其反转,以降序排列(可选)。

答案 1 :(得分:1)

如果“前10名”是指字符串中的“10个最常用单词”,用逗号,分隔,则可以执行以下操作:

$string = "The,Economy,Could,Be,Given,A,Post,Brexit,Vote,Vote,Vote,Vote,Boost,This,Week,As,Expectations,Mount,That,The,Bank,Bank,Bank,Bank,Bank,Of,England,England,England,England,England,Will,Cut,Economy,Economy,Economy,Brexit,Brexit,Brexit,Brexit";

//Create array of words split by ","
$words = explode(",",$string);

//Create an empty array to hold data
$wordData = [];

foreach($words as $word){
    //Convert to lower case (for uniformity)
    $word = strtolower($word);

    //Add to an array if doesn't exist; if it does,
    //add to the number
    if(isset($wordData[$word])){
        $wordData[$word]++;
    } else $wordData[$word] = 1;
}

//Order $wordData array by number
arsort($wordData);

print_r($wordData);

这将输出:

  

阵   (       [英格兰] =&gt;五       [银行] =&gt;五       [Brexit] =&gt;五       [投票] =&gt; 4       [经济] =&gt; 4       [] =&gt; 2       [期望] =&gt; 1       [Will] =&gt; 1       [Of] =&gt; 1       [那] =&gt; 1       [装载] =&gt; 1       [这] =&gt; 1       [As] =&gt; 1       [周] =&gt; 1       [Boost] =&gt; 1       [发布] =&gt; 1       [A] =&gt; 1       [given] =&gt; 1       [Be] =&gt; 1       [可能] =&gt; 1       [Cut] =&gt; 1   )

过滤掉特定字词:

//Establish array of words to filter
$filterWords = ["the", "is", "are", "of", "that"];

//Remove those words from the array created earlier
foreach($filterWords as $fw){
    if(isset($wordData[$fw])) unset($wordData[$fw]);
}

print_r($wordData);

这将输出:

  

数组([英国] =&gt; 5 [银行] =&gt; 5 [brexit] =&gt; 5 [投票] =&gt; 4 [经济] =&gt; 4 [期望] =&gt; 1 [将] =&gt; 1 [mount] =&gt; 1 [this] =&gt; 1 [as] =&gt; 1 [week] =&gt; 1 [boost] =&gt; 1 [post] =&gt; 1 [a] = &gt; 1 [given] =&gt; 1 [be] =&gt; 1 [could] =&gt; 1 [cut] =&gt; 1)

答案 2 :(得分:1)

您可以使用explode和数组:

$db_tag="The,Economy,Could,Be,Given,A,Post,Brexit,Vote,Vote,Vote,Vote,Boost,This,Week,As,Expectations,Mount,That,The,Bank,Bank,Bank,Bank,Bank,Of,England,England,England,England,England,Will,Cut,Economy,Economy,Economy,Brexit,Brexit,Brexit,Brexit";
$array = array();
foreach (explode(',', $db_tag) as $val) 
{
    if(!isset($array[$val]))
    {
        $array[$val] = 1;
    }
    else
    {
        $array[$val]++;
    }
}
arsort($array);
print_r($array);

将输出:

Array
(
    [England] => 5
    [Bank] => 5
    [Brexit] => 5
    [Vote] => 4
    [Economy] => 4
    [The] => 2
    [Expectations] => 1
    [Will] => 1
    [Of] => 1
    [That] => 1
    [Mount] => 1
    [This] => 1
    [As] => 1
    [Week] => 1
    [Boost] => 1
    [Post] => 1
    [A] => 1
    [Given] => 1
    [Be] => 1
    [Could] => 1
    [Cut] => 1
)

答案 3 :(得分:0)

使用波纹管函数从字符串中提取搜索关键字

function getKeywords($string)
{
    $string = "North Korea has recently introduced a sweeping new law which seeks to stamp out any kind of foreign influence - harshly punishing anyone caught with foreign films, clothing or even using slang. But why?Yoon Mi-so says she was 11 when she first saw a man executed for being caught with a South Korean drama.    His entire neighbourhood was ordered to watch. If you didn't, it would be classed as treason, she told the BBC from her home in Seoul.        The North Korean guards were making sure everyone knew the penalty for smuggling illicit videos was death. I have a strong memory of the man who was blindfolded, I can still see his tears flow down. That was traumatic for me. The blindfold was completely drenched in his tears. ";
    $vowels = ["a","e","i","o","u"];
    $ignore = ["th","thy","sh"];
    $string = str_replace($vowels, "", $string);

//Create array of words split by ","
$words = explode(" ",$string);

//Create an empty array to hold data
$wordData = [];

foreach($words as $word){
    //Convert to lower case (for uniformity)
    $word = trim(strtolower($word));
    if(strlen($word)<3)
        continue;
    if(array_search($word, $ignore)>-1) continue;
    //Add to an array if doesn't exist; if it does,
    //add to the number
    if(isset($wordData[$word])){
        $wordData[$word]++;
    } else $wordData[$word] = 1;
}

//Order $wordData array by number
arsort($wordData);

$x = (array_keys($wordData));
$result = "";
$count = 0;

foreach ($wordData as $key => $value) {
    $count++;
    $result .=$key . ",";
    if($count==10) break;
}

return $result;
}