我正在尝试创建一个函数,该函数将发现文本中出现了多少次不同的单词。问题是,我想将相似的单词(和昵称)捆绑在一起。
我有这个有趣的单词数组(我已经手动定义了):
$interesting_words = [
'test' => [
'number_of_occurances' => 0,
'connected_words' => [
'TEST',
'TESTER',
'TESTING'
]
],
'foobar' => [
'number_of_occurances' => 0,
'connected_words' => [
'FOO',
'FOOBAR',
'BAR'
]
]
]
示例文本。
Lorem ipsum TEST坐下来,值得一试。在turpis dui中解禁。 Maecenas venenatis FOOBAR facilisis。 Quisque dictum,diam结果 mollis TESTING,orci tellus aliquet nisl,BAR molestie FOO augue at est。在测试vehicula lectus中。 Curabitur ac varius ligula。 Pellentesque orci urdna。
所需的输出。
Number of occurances for 'test': 4
Number of occurances for 'foobar': 3
在没有1.000.000 for循环的情况下,是否有一种聪明的方法?
我正在用Laravel编写函数,如果有帮助的话。
答案 0 :(得分:1)
当性能和仅出现次数计数时,您可以使用str_word_count && array_count_values,
获取所有单词出现,并使用strtolower
使搜索不区分大小写:
$words=array_count_values(str_word_count(strtolower($str),1));
foreach($interesting_words as $index=>&$details){
foreach($details['connected_words'] as $key=>$similar){
$details['number_of_occurances'] += $words[strtolower($similar)];
}
}
print_r($interesting_words );
输出:
Array
(
[test] => Array
(
[number_of_occurances] => 4
[connected_words] => Array
(
[0] => TEST
[1] => TESTER
[2] => TESTING
)
)
[foobar] => Array
(
[number_of_occurances] => 3
[connected_words] => Array
(
[0] => FOO
[1] => FOOBAR
[2] => BAR
)
)
)
答案 1 :(得分:0)
我认为可以通过explode
和array_count_values
来完成并使其起作用。在下面的示例中,我删除了.
和,
<?php
$interesting_words = [
'test' => [
'number_of_occurances' => 0,
'connected_words' => [
'TEST',
'TESTER',
'TESTING'
]
],
'foobar' => [
'number_of_occurances' => 0,
'connected_words' => [
'FOO',
'FOOBAR',
'BAR'
]
]
];
$str = 'Lorem ipsum TEST sit amet, consectetur TESTER elit. Sed in turpis dui. Maecenas venenatis FOOBAR facilisis. Quisque dictum, diam consequat mollis TESTING, orci tellus aliquet nisl, BAR molestie FOO augue at est. In TESTING vehicula lectus. Curabitur ac varius ligula. Pellentesque orci urdna.';
$str = preg_replace('/[\.\,]/i','',$str);
$str = strtolower($str);
$str_arr = explode(" ",$str);
$str_occurance_counts = array_count_values($str_arr);
foreach($interesting_words as $k=>&$v){
foreach($v['connected_words'] as $c=>$cVal){
$v['number_of_occurances'] += $str_occurance_counts[strtolower($cVal)];
}
}
print_r($interesting_words );
?>
答案 2 :(得分:0)
<?php
$interesting_words = [
'test' => [
'number_of_occurances' => 0,
'connected_words' => [
'TEST',
'TESTER',
'TESTING'
]
],
'foobar' => [
'number_of_occurances' => 0,
'connected_words' => [
'FOO',
'FOOBAR',
'BAR'
]
]
];
$testCount=$interesting_words['test']['number_of_occurances'];
$foobarCount=$interesting_words['foobar']['number_of_occurances'];
$text="Lorem ipsum TEST sit amet, consectetur TESTER elit. Sed in turpis dui. Maecenas venenatis
FOOBAR facilisis. Quisque dictum, diam consequat mollis TESTING, orci tellus aliquet nisl, BAR
molestie FOO augue at est. In TESTING vehicula lectus. Curabitur ac varius ligula.
Pellentesque orci urdna.";
$arr= explode(" ", $text);
$numberOfWords=count($arr);
for($i=0;$i<$numberOfWords;$i++)
{
echo "<br/>";
if(strpos($arr[$i],'TEST') !== false){
$testCount=$testCount+1;
}
elseif(strpos($arr[$i],'TESTER') !== false){
$testCount=$testCount+1;
}
elseif(strpos($arr[$i],'TESTING') !== false){
$testCount=$testCount+1;
}
elseif(strpos($arr[$i],'FOO') !== false){
$foobarCount=$foobarCount+1;
}
elseif(strpos($arr[$i],'FOOBAR') !== false){
$foobarCount=$foobarCount+1;
}
elseif(strpos($arr[$i],'BAR') !== false){
$foobarCount=$foobarCount+1;
}
}
echo "Number of occurances for 'test':".$testCount;
echo "</br>";
echo "Number of occurances for 'foobar':".$foobarCount;