计算字符串php中的单词数量

时间:2017-02-21 17:29:47

标签: php curl

我打印一个包含来自给定网址的HTML内容的字符串。我想要做的是找出字符串中有多少单词以及它们出现的次数。

例如:

今天| 1

如何| 1

你好| 1

代码:

$string = "Hello how are you today"

2 个答案:

答案 0 :(得分:0)

这样的事情:

  $s = "lorem ipsum dolor sit amet, consectetur adipiscing elit, sit sed do lorem eiusmod tempor";
  $w = preg_split('=[^\w]=', $s, NULL, PREG_SPLIT_NO_EMPTY);
  $words = [];

  foreach ($w as $word) {
    if (!isset($words[$word])) $words[$word] = 0;
    $words[$word]++;
  }
  print_r($words);

输出:

Array
(
    [lorem] => 2
    [ipsum] => 1
    [dolor] => 1
    [sit] => 2
    [amet] => 1
    [consectetur] => 1
    [adipiscing] => 1
    [elit] => 1
    [sed] => 1
    [do] => 1
    [eiusmod] => 1
    [tempor] => 1
)

这就是你要找的东西吗?

答案 1 :(得分:0)

以$ cResult作为输入:

$word_counts = [];

// remove scripts and styles completely, then strip tags
$cResult = preg_replace('#<script(.*?)>(.*?)</script>#is', '', $cResult);
$cResult = preg_replace('#<style(.*?)>(.*?)</style>#is', '', $cResult);
$cResult = strip_tags($cResult);

// strip all characters that are not letters:
$word_array_raw = explode(' ',preg_replace('/[^A-Za-z ]/', ' ', $cResult)); 

// loop through array:
foreach ($word_array_raw as $word) {
    $word = trim($word);
    if($word) {
        isset($word_counts[$word]) ? $word_counts[$word]++ : $word_counts[$word] = 1;
    }
}

// Array with all stats sorted in descending order:
arsort($word_counts); 

// Output format you wanted:
foreach ($word_counts as $word=>$count) { 
    echo "$word | $count<br>";
}

希望有所帮助