php:计算给定字符串中单词的实例然后返回在另一个数组中匹配的前5位

时间:2013-05-30 08:51:07

标签: php arrays

php: sort and count instances of words in a given string

在本文中,我知道如何计算给定字符串中的单词实例并按频率排序。现在我想做进一步的工作,将结果词匹配到花药数组($ keywords),然后只得到前5个单词。但我不知道该怎么做,打开一个问题。感谢。

$txt = <<<EOT
The 2013 Monaco Grand Prix (formally known as the Grand Prix de Monaco 2013) was a Formula One motor race that took place on 26 May 2013 at the Circuit de Monaco, a street circuit that runs through the principality of Monaco. The race was won by Nico Rosberg for Mercedes AMG Petronas, repeating the feat of his father Keke Rosberg in the 1983 race. The race was the sixth round of the 2013 season, and marked the seventy-second time the Monaco Grand Prix has been held. Rosberg had started the race from pole.
Background
Mercedes protest
Just before the race, Red Bull and Ferrari filed an official protest against Mercedes, having learned on the night before the race of a three-day tyre test undertaken by Pirelli at the venue of the last grand prix using Mercedes' car driven by both Hamilton and Rosberg. They claimed this violated the rule against in-season testing and gave Mercedes a competitive advantage in both the Monaco race and the next race, which would both be using the tyre that was tested (with Pirelli having been criticised following some tyre failures earlier in the season, the tests had been conducted on an improved design planned to be introduced two races after Monaco). Mercedes stated the FIA had approved the test. Pirelli cited their contract with the FIA which allows limited testing, but Red Bull and Ferrari argued this must only be with a car at least two years old. It was the second test conducted by Pirelli in the season, the first having been between race 4 and 5, but using a 2011 Ferrari car.[4]
Tyres
Tyre supplier Pirelli brought its yellow-banded soft compound tyre as the harder "prime" tyre and the red-banded super-soft compound tyre as the softer "option" tyre, just as they did the previous two years. It was the second time in the season that the super-soft compound was used at a race weekend, as was the case with the soft tyre compound.
EOT;

$words = array_count_values(str_word_count($txt, 1));
arsort($words);
var_dump($words);

$keywords = array("Monaco","Prix","2013","season","Formula","race","motor","street","Ferrari","Mercedes","Hamilton","Rosberg","Tyre"); 
//var_dump($words) which should match in $keywords array, then get top 5 words.

1 个答案:

答案 0 :(得分:1)

你已经将$ words作为一个关联数组,由单词索引并以count作为值,因此我们使用array_flip()使你的$ keywords数组成为一个由word索引的关联数组。然后我们可以使用array_intersect_key()仅返回我们翻转的$ keywords数组中具有匹配索引条目的$ words中的条目。

这给出了一个结果$ matchWords数组,仍然被单词键入,但只包含原始$ words数组中与$ keywords匹配的那些条目;并且仍按频率排序。

然后我们只需使用array_slice()从该数组中提取前5个条目。

$matchWords = array_intersect_key(
    $words,
    array_flip($keywords)
);

$matchWords = array_slice($matchWords, 0, 5);
var_dump($matchWords);

给出

array(5) {
  'race' =>
  int(11)
  'Monaco' =>
  int(7)
  'Mercedes' =>
  int(5)
  'Rosberg' =>
  int(4)
  'season' =>
  int(4)
}

警告:你可能会遇到区分大小写的问题。 “种族”!==“竞赛”,因此$words = array_count_values(str_word_count($txt, 1));行会将这些视为两个不同的词。