文本中单词出现的次数(以及类似单词)

时间:2018-09-08 09:24:04

标签: php laravel

我正在尝试创建一个函数,该函数将发现文本中出现了多少次不同的单词。问题是,我想将相似的单词(和昵称)捆绑在一起。

我有这个有趣的单词数组(我已经手动定义了):

$interesting_words = [
  'test' => [
    'number_of_occurances' => 0,
    'connected_words' => [
        'TEST',
        'TESTER',
        'TESTING'
      ]
    ],
  'foobar' => [
    'number_of_occurances' => 0,
    'connected_words' => [
        'FOO',
        'FOOBAR',
        'BAR'
      ]
    ]
]

示例文本。

  

Lorem ipsum TEST坐下来,值得一试。在turpis dui中解禁。   Maecenas venenatis FOOBAR facilisis。 Quisque dictum,diam结果   mollis TESTING,orci tellus aliquet nisl,BAR molestie FOO augue at   est。在测试vehicula lectus中。 Curabitur ac varius ligula。   Pellentesque orci urdna。

所需的输出。

Number of occurances for 'test': 4
Number of occurances for 'foobar': 3

在没有1.000.000 for循环的情况下,是否有一种聪明的方法?

我正在用Laravel编写函数,如果有帮助的话。

3 个答案:

答案 0 :(得分:1)

当性能和仅出现次数计数时,您可以使用str_word_count && array_count_values,获取所有单词出现,并使用strtolower使搜索不区分大小写:

$words=array_count_values(str_word_count(strtolower($str),1));
foreach($interesting_words as $index=>&$details){
    foreach($details['connected_words'] as $key=>$similar){
        $details['number_of_occurances'] += $words[strtolower($similar)];
    }
}           
print_r($interesting_words );

输出:

Array
(
    [test] => Array
        (
            [number_of_occurances] => 4
            [connected_words] => Array
                (
                    [0] => TEST
                    [1] => TESTER
                    [2] => TESTING
                )

        )

    [foobar] => Array
        (
            [number_of_occurances] => 3
            [connected_words] => Array
                (
                    [0] => FOO
                    [1] => FOOBAR
                    [2] => BAR
                )

        )

)

答案 1 :(得分:0)

我认为可以通过explodearray_count_values来完成并使其起作用。在下面的示例中,我删除了.,

<?php
$interesting_words = [
  'test' => [
    'number_of_occurances' => 0,
    'connected_words' => [
        'TEST',
        'TESTER',
        'TESTING'
      ]
    ],
  'foobar' => [
    'number_of_occurances' => 0,
    'connected_words' => [
        'FOO',
        'FOOBAR',
        'BAR'
      ]
    ]
];
$str = 'Lorem ipsum TEST sit amet, consectetur TESTER elit. Sed in turpis dui. Maecenas venenatis FOOBAR facilisis. Quisque dictum, diam consequat mollis TESTING, orci tellus aliquet nisl, BAR molestie FOO augue at est. In TESTING vehicula lectus. Curabitur ac varius ligula. Pellentesque orci urdna.';
$str = preg_replace('/[\.\,]/i','',$str);
$str = strtolower($str);
$str_arr = explode(" ",$str);
$str_occurance_counts = array_count_values($str_arr);
foreach($interesting_words as $k=>&$v){
  foreach($v['connected_words'] as $c=>$cVal){
    $v['number_of_occurances'] += $str_occurance_counts[strtolower($cVal)];
  }
}
print_r($interesting_words );
?>

Live Demo Server1

Live Demo Server2

答案 2 :(得分:0)

<?php


$interesting_words = [
  'test' => [
    'number_of_occurances' => 0,
    'connected_words' => [
        'TEST',
        'TESTER',
        'TESTING'
      ]
    ],
  'foobar' => [
    'number_of_occurances' => 0,
    'connected_words' => [
        'FOO',
        'FOOBAR',
        'BAR'
      ]
    ]
];

$testCount=$interesting_words['test']['number_of_occurances'];
$foobarCount=$interesting_words['foobar']['number_of_occurances'];

$text="Lorem ipsum TEST sit amet, consectetur TESTER elit. Sed in turpis dui. Maecenas venenatis 
FOOBAR facilisis. Quisque dictum, diam consequat mollis TESTING, orci tellus aliquet nisl, BAR 
molestie FOO augue at est. In TESTING vehicula lectus. Curabitur ac varius ligula. 
Pellentesque orci urdna.";

$arr= explode(" ", $text);
$numberOfWords=count($arr);
for($i=0;$i<$numberOfWords;$i++)
{
    echo "<br/>";

    if(strpos($arr[$i],'TEST') !== false){
        $testCount=$testCount+1;
    }

    elseif(strpos($arr[$i],'TESTER') !== false){          

    $testCount=$testCount+1;
    }
    elseif(strpos($arr[$i],'TESTING') !== false){

    $testCount=$testCount+1;
    } 

   elseif(strpos($arr[$i],'FOO') !== false){

    $foobarCount=$foobarCount+1;
    }  

   elseif(strpos($arr[$i],'FOOBAR') !== false){

    $foobarCount=$foobarCount+1;
    } 

   elseif(strpos($arr[$i],'BAR') !== false){ 

    $foobarCount=$foobarCount+1;
    }   
}
echo "Number of occurances for 'test':".$testCount;
echo "</br>";
echo "Number of occurances for 'foobar':".$foobarCount;