在这个问题之后: Pattern for check single occurrency into preg_match_all
我知道我的模式每个周期只能包含一个单词,因为在该问题中报告的情况下,我必须找到" microsoft"和#34;微软交换"而且我无法修改我的正则表达式,因为这两种可能性是从数据库中提供的!
所以我的问题是:在200多个preg_match和相同数量的str_pos之间哪个更好的解决方案来检查char的子集是否包含这些单词?
我试图为两种解决方案编写可能的代码:
$array= array(200+ values);
foreach ($array as $word)
{
if(str_pos($word, $text)>-1)
{
fields['skill'][] = $word;
}
}
替代方案是:
{{1}}
答案 0 :(得分:1)
基于REGEX的函数比大多数其他字符串函数慢。
顺便说一下,如果像$pattern='<\b(?:'.$word1.'|'.$word2.'|'.$word3.'|'.$word4.')\b>i';
那样使用一个正则表达式,你的测试也可以做到这一点,你可以一次使用多少个单词取决于正则表达式的长度。我在测试正则表达式创建了12004个字符长。似乎不是最大的。
正则表达式版本(单次调用):
$array= array(200+ values);
$pattern='<\b(?:'.implode('|',$array).')\b>i';
preg_match_all($pattern, $text, $matches);
//$fields['skill'][] = $matches[0][0];
strpos版本(多次通话)
$array= array(200+ values);
foreach ($array as $word){
if(strpos($word, $text)!==false)//not with >-1 wont work.
{
fields['skill'][] = $word;
}
}
如果您要查找单个字词,strpos将与Hello
中的HelloWorld
匹配,
所以如果你只想要真正的词汇,你可以这样做:
$arrayOfWords = explode(' ',$string);
//and now you can check array aginst array
$array= array(200+ values);
foreach ($array as $word){
if(in_array($word,$arrayOfWords))//not with >-1 wont work.
{
fields['skill'][] = $word;
}
}
//you can makes this also faster if you array_flip the arrayOfWords
//and then check with 'isset' (more faster than 'in_array')
如果您的单词列表中没有这种组合,那么您也希望以这种方式匹配单词组合(“microsoft exchange”)。
*添加了评论
答案 1 :(得分:1)
strpos
比preg_match
快得多,这是一个基准:
$array = array();
for($i=0; $i<1000; $i++) $array[] = $i;
$nbloop = 10000;
$text = <<<EOD
I understand that my pattern must contain only a word per cycle because, in the case reported in that question, I must find "microsoft" and "microsoft exchange" and I can't modify my regexp because these two possibilities are given dinamically from a database!
So my question is: which is the better solution between over 200 preg_match and the same numbers of str_pos to check if a subset of char contains these words?
EOD;
$start = microtime(true);
for ($i=0; $i<$nbloop; $i++) {
foreach ($array as $word) {
$pattern='<\b(?:'.$word.')\b>i';
if (preg_match_all($pattern, $text, $matches)) {
$fields['skill'][] = $matches[0][0];
}
}
}
echo "Elapse regex: ", microtime(true)-$start,"\n";
$start = microtime(true);
for ($i=0; $i<$nbloop; $i++) {
foreach ($array as $word) {
if(strpos($word, $text)>-1) {
$fields['skill'][] = $word;
}
}
}
echo "Elapse strpos: ", microtime(true)-$start,"\n";
<强>输出:强>
Elapse regex: 7.9924139976501
Elapse strpos: 0.62015008926392
它快了大约13倍。