关键字:所有超过3个字符的字词
我想将两个字符串之间的关键字与这些条件进行比较:
事实上,我每天都有一个机器人攻击两个新闻网站并将新闻复制到我的数据库。然后我需要一个算法来比较新闻标题和识别重复的新闻。 (如你所知,同样的新闻在不同的新闻网站上有不同的标题。但通常,相同新闻的标题都包括相同的关键词)
例1: 移动单词并不重要
str1= 'hello petter'
str2= 'petter hello'
result: 0
例2: 不计算少于3个字符的单词
str1= 'hello !!'
str2= 'petter hello'
result: 0 // '!!' are less than 3characters and str1 is 'hello'. then result:0
OR
str1= 'hello petter how are u?'
str2= 'petter hello how are you'
result: 0 // str1 is 'hello petter how are'
示例3: 必须更改变量
str1= 'hello petter how are you ?'
str2= 'petter hello how are you?'
// Then
str1= 'hello petter how are you?'
str2= 'petter hello how are you ?'
result:1 // 1 is for 'you' (in str1)
范例4: str2
中不同的单词并不重要str1= 'hello petter how are you?'
str2= 'petter hello how are you ?'
result: 1 // str2 is 'petter hello how are you', then 1 is for: 'you?' (in str1)
注意:'你' (在str2中)对我来说并不重要,因为它不匹配 用str1的任何单词。
咒骂示例: (更多信息)
str1= 'petter hello how are you pal?'
str2= 'petter hello how are... !!'
// In first str1 change with str2
str1= 'petter hello how are... !!'
str2= 'petter hello how are you pal?'
// Then remove '!!' (in str1)
str1= 'petter hello how are...'
str2= 'petter hello how are you pal?'
result: 1 // 1 for 'are...' (in str1) - ['are','you','pal?' does not matter (in str2)]
最后,我需要一个函数来通过结果和关键字数量(所有超过3个字符的单词)来识别重复的新闻。
$keywords_numb=7;
$result=2;
function identify_duplicate($keywords_numb,$result){
if($keywords_numb / 3 >= $result){
$Specified = 'this is a new news';
}
else $Specified = 'this is a duplicate news';
return $Specified;
}
echo $Specified;
输出:
this is a new news
有人知道我该怎么写这个程序吗?此致
答案 0 :(得分:2)
你不需要正则表达式..你可以使用以下函数并以任何顺序传递字符串:
function identify_duplicate($var1, $var2){
if(strlen($var1)>=strlen($var2)){
$str1 = $var1;
$str2 = $var2;
}
else{
$str1 = $var2;
$str2 = $var1;
}
$str1 = explode(" ", $str1);
$str2 = explode(" ", $str2);
$return = sizeof($str1);
foreach($str1 as $val){
if(in_array($val, $str2) || strlen($val) <= 3){
$return = $return - 1;
}
}
return $return;
}
答案 1 :(得分:0)
在 @karthik manchala 的帮助下,我做到了......
$str1='this news is about a player named Ronaldo';
$str2='The player who called Ronaldo';
function identify_duplicate($str1, $str2){
if(strlen($str1)>strlen($str2)){
list($str1, $str2) = array($str2, $str1); // swap two variables
}
$str1 = explode(" ", $str1);
$str2 = explode(" ", $str2);
$words_numb = sizeof($str1);
$result=$words_numb;
foreach($str1 as $val){
if(in_array($val, $str2) || strlen($val) <= 3){
$result--;
}
}
if($words_numb / 3 >=$result){
$Specified = 'this is a duplicate news';
}
else $Specified = 'this is a new news';
return $Specified;
}
$out=identify_duplicate($str1, $str2);
echo $out;
<强>输出:强>
这是重复的新闻