有两个dbs:
+-----+----------------+
| id | tag |
+----------------------+
| 1 | Audi |
| 2 | BMW |
| 3 | Volkswagen |
| 4 | Mercedes Benz |
+----------------------+
和
+-----+-------------------------------+------+
| id | title | tag |
+-------------------------------------+ -----+
| 1 | Audi is a great car | NULL |
+-------------------------------------+------+
我需要做什么:
1.检查哪个标签与标题最相关。
2.提取最相关的标签并插入标题附近的数据库中。
到目前为止我做了什么:
function compareStrings($s1, $s2) {
//one is empty, so no result
if (strlen($s1)==0 || strlen($s2)==0) {
return 0;
}
//replace none alphanumeric charactors
//i left - in case its used to combine words
$s1clean = preg_replace("/[^A-Za-z0-9-]/", ' ', $s1);
$s2clean = preg_replace("/[^A-Za-z0-9-]/", ' ', $s2);
//remove double spaces
while (strpos($s1clean, " ")!==false) {
$s1clean = str_replace(" ", " ", $s1clean);
}
while (strpos($s2clean, " ")!==false) {
$s2clean = str_replace(" ", " ", $s2clean);
}
//create arrays
$ar1 = explode(" ",$s1clean);
$ar2 = explode(" ",$s2clean);
$l1 = count($ar1);
$l2 = count($ar2);
//flip the arrays if needed so ar1 is always largest.
if ($l2>$l1) {
$t = $ar2;
$ar2 = $ar1;
$ar1 = $t;
}
//flip array 2, to make the words the keys
$ar2 = array_flip($ar2);
$maxwords = max($l1, $l2);
$matches = 0;
//find matching words
foreach($ar1 as $word) {
if (array_key_exists($word, $ar2))
$matches++;
}
return ($matches / $maxwords) * 100;
}
$all_values = '';
$sql_object = "SELECT * FROM tag";
$result_object = mysql_query($sql_object);
while($row_object = mysql_fetch_array($result_object))
{
$tag = $row_object['tag'];
$sql_subject = "SELECT * FROM title ORDER BY added";
$result_subject = mysql_query($sql_subject);
while($row_subject = mysql_fetch_array($result_subject))
{
$title = $row_subject['title'];
$all_values .= "Title($title) and Tag($tag) relevancy:". compareStrings($tag, $title) . "%"."<br/>";
}
}
echo $all_values;
输出:
Title(Audi is a great car) and Tag(Audi) relevancy:20%
Title(Audi is a great car) and Tag(BMW) relevancy:0%
Title(Audi is a great car) and Tag(Volkswagen) relevancy:0%
Title(Audi is a great car) and Tag(Mercedes Benz) relevancy:0%
问题是:如何从$ all_values中提取最相关的标签并插入到数据库中,因为这里我被卡住了。或者也许有更好的解决方案。我将不胜感激任何帮助。