我有一段PHP代码如下:
$words = array(
'Art' => '1',
'Sport' => '2',
'Big Animals' => '3',
'World Cup' => '4',
'David Fincher' => '5',
'Torrentino' => '6',
'Shakes' => '7',
'William Shakespeare' => '8'
);
$text = "I like artists, and I like sports. Can you call the name of a big animal? Brazil World Cup matchers are very good. William Shakespeare is very famous in the world.";
$all_keywords = $all_keys = array();
foreach ($words as $word => $key) {
if (strpos(strtolower($text), strtolower($word)) !== false) {
$all_keywords[] = $word;
$all_keys[] = $key;
}
}
echo $keywords_list = implode(',', $all_keywords) ."<br>";
echo $keys_list = implode(',', $all_keys) . "<br>";
代码回声Art,Sport,World Cup,Shakes,William Shakespeare
和1,2,4,7,8
;但是,代码非常简单,并且不够准确,无法回显正确的关键字。例如,由于'Shakes' => '7'
中的Shakespeare
字,代码会返回$text
,但正如您所见,&#34; Shakes&#34;不能代表&#34;莎士比亚&#34;作为一个合适的关键字基本上我想要返回Art,Sport,World Cup,William Shakespeare
和1,2,4,8
而不是Art,Sport,World Cup,Shakes,William Shakespeare
和1,2,4,7,8
。那么,您能否帮助我如何开发更好的代码来提取关键字而不会遇到类似的问题?谢谢你的帮助。
答案 0 :(得分:4)
您可能希望查看正则表达式以清除部分匹配:
// create regular expression by using alternation
// of all given words
$re = '/\b(?:' . join('|', array_map(function($keyword) {
return preg_quote($keyword, '/');
}, array_keys($words))) . ')\b/i';
preg_match_all($re, $text, $matches);
foreach ($matches[0] as $keyword) {
echo $keyword, " ", $words[$keyword], "\n";
}
表达式使用\b
断言来匹配单词边界,即单词必须独立。
<强>输出强>
World Cup 4
William Shakespeare 8
答案 1 :(得分:2)
如果您想要准确匹配,最好使用regular expressions。
我修改了您的原始代码,而不是strpos()
,因为它会导致部分匹配,就像您的代码一样。
还有改进的余地,但希望你能得到它的基本要点。
如果您有任何问题,请与我们联系。
代码被修改为shell脚本,因此保存到 demo.php 和 chmod + x demo.php&amp;&amp; ./demo.php 的
` #!的/ usr / bin中/ PHP
//array of regular expressions to match your words/phrases
$words = array(
'/\b[Aa]rt\b/',
'/\bI\b/',
'/\bSport\b/',
'/\bBig Animals\b/' ,
'/\bWorld Cup\b/' ,
'/\bDavid Fincher\b/',
'/\bTorrentino\b/' ,
'/\bShakes\b/' ,
'/\b[sS]port[s]{0,1}\b/' ,
'/\bWilliam Shakespeare\b/',
);
$text = "I like artists and art, and I like sports. Can you call the name of a big animal? Brazil World Cup matchers are very good. William Shakespeare is very famous in the world.";
$all_keywords = array(); //changed formatting for clarity
$all_keys = array();
foreach ($words as $regex) {
$m = array();
if (preg_match_all($regex, $text, $m, PREG_OFFSET_CAPTURE)>=1)
for ($n=0;$n<count($m); ++$n) {
$match = $m[0];
foreach($match as $mm) {
$key = $mm[1]; //key is the offset in $text where the match begins
$word = $mm[0]; //the matched word/phrase
$all_keywords[] = $word;
$all_keys[] = $key;
}
}
}
echo "\$text = \"$text\"\n";
echo $keywords_list = implode(',', $all_keywords) ."<br>\n";
echo $keys_list = implode(',', $all_keys) . "<br>\n";
`
答案 2 :(得分:0)
替换
strpos(strtolower($text), strtolower($word)
使用
preg_match('/\b'.$word.'\b/',$text)
或者,因为你似乎并不关心大写字母:
preg_match('/\b'.strtolower($word).'\b/', strtolower($text))
在这种情况下,我建议您事先执行strtolower($text)
,例如在foreach
开始之前。
答案 3 :(得分:0)
从我的头脑中,我认为还有两个额外的步骤可以使这个功能有点健壮。
P.S。所以应用程序摇滚!但仍然不容易编码(血腥的自动更正!)