PHP preg_match或str_replace查找相关的同义词

时间:2012-01-28 00:49:36

标签: php arrays string preg-replace str-replace

我有一个像这样的print_r变量数组:

$ synonyms =

阵列  (

[0] => a company|an organization|a business
[1] => auto|automobile|car|vehicle
[2] => aware|conscious|mindful|informed|knowledgeable
[3] => aware of|conscious of|mindful of
[4] => be aware|bear in mind|remember
[5] => be aware of|concentrate on|pay attention to|know about|be familiar with
[6] => carefully|cautiously|meticulously|very carefully|thoroughly|properly
[7] => cases|instances|circumstances|situations|scenarios|conditions
[8] => comes|arrives|will come|happens
[9] => company|business|organization|firm|corporation|provider
[10] => coverage|protection
[11] => in most|in many
[12] => in most cases|generally|usually|normally|typically|most often
[13] => in the|within the|inside the|while in the|from the|during the
[14] => increases|raises|will increase|boosts|improves
[15] => information|info|data|details|facts|information and facts
[16] => insurance|insurance coverage|insurance policy|insurance plan|insurance policies
[17] => on the|around the|within the|to the|about the|over the
[18] => once|as soon as|when|after|the moment|at the time
[19] => once you|when you|as soon as you|after you|when you finally
[20] => or a|or perhaps a|or possibly a|or even a|or maybe a|or simply a
[21] => package|package deal|bundle|deal|offer
[22] => payment|fee|cost
[23] => policy|coverage|plan
[24] => premium|top quality|high quality
[25] => question|query|concern|issue|problem
[26] => rates|prices|charges|premiums|costs|fees
[27] => receive|get|obtain|acquire|be given
[28] => receive a|get a
[29] => representative|consultant
[30] => review|evaluation|assessment|critique|overview|evaluate
[31] => the new|the brand new|the newest|the modern
[32] => time to|time for you to
[33] => with your|together with your|along with your|with the
[34] => you have|you've|you've got|you might have|you may have|you have got

);

这里我的句子=

$ sentence =“当需要续订您的汽车保险政策时,了解您的运营商如何处理续订。在大多数情况下,您会收到包含新政策邮件的续订套餐谨慎查看此信息,并向您的代理人或公司代表询问有关保费的任何增加。在大多数情况下,一旦您提交了新保单的付款,您就续签了保险。“;

这里是用标准重写器软件的相关同义词替换的示例代码:

$searches =  array();

for ($i=0; $i < count($synonyms); $i++) {
   $words = explode("|", $synonyms[$i], 3);
   $searches[$i] = $words[0];
   $replaces[$i] = $synonyms[$i];
}

function cmp($a,$b) {
   if ($a == $b) return 0;
   if (strpos($a, $b) !== false) return -1;
   if (strpos($b, $a) !== false) return 1;
   return 0;
}

uasort($searches, 'cmp');

$replaces_new = array();

$i=0;

foreach ($searches as $k=>$v) {
   $replaces_new[$i] = "{{$replaces[$k]}}";
   $i++;
}

$output = str_replace($searches, $replaces_new, $output);

echo $output;

## This code was modified from here :
## http://stackoverflow.com/questions/9031199/php-preg-match-to-find-relevant-word

我的输出是这样的:

当{到达|将来|发生} {时间让你}更新你的{汽车|汽车|车辆} {保险|保险|保险} |保险单|保险计划|保险单} {policy | {coverage | protection} | plan}, {{be {aware | conscious | mindful | informed | knowledgeable} |记住|记住} |专注于|关注|知道关于|熟悉} 您的运营商如何处理续订。在大多数{案例|实例|环境|情境|场景|条件}中,您{{获取|获取|获取|获得} a |获取}续订{package | package deal | bundle | deal | offer} {in |在|期间|里面| |来自|期间|邮件{与你的|一起与你一起|与你的|一起|新的{policy | {coverage | protection} | plan}和{rate |价格|费用|溢价|成本|费}。查看{信息|信息|数据|详细信息|事实|信息和事实} {仔细|谨慎|谨慎|非常仔细|彻底|正确},{question | query | c {once | after | when | after |在{高级|高质量|高质量} {与您的|一起|与您的|一起|任何{增加|将|增加|提升|}的时刻|当前} |问题|问题}代理{或者|或者可能是|或者可能是|或者甚至是|或者可能是|或者只是} {公司|商业|组织|公司|公司|提供商} {代表|顾问}。在大多数{案例|实例|环境|情境|情景|条件}中,{{一旦| |当| |时刻|当时}你|当你|在你之后|当你|之后最后}在{新|最新|最新|现代} {policy | {coverage | protection} | plan}上提交{payment | fee | cost},{你有|你有|你有... |你可能有|你可能已经更新了{coverage | protection}。

这是我希望的输出:

$ output =“当它来{到达|将来|发生} {时间让你}更新你的{汽车|汽车|车辆} {保险|保险|保险|保险计划|保险计划|保险单} {policy | coverage | plan}, {注意|关注|关注|了解|熟悉} 您的运营商如何处理续订。{大多数情况下|一般情况下|通常|通常|通常|最常见},您{在| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |邮件{与您一起|与您一起|与您一起|与新{政策|覆盖|计划}和{费率|价格|费用|保费|费用|费用}。{评论|评估|评估|批判|概述|评估}此{信息|信息|数据|详细信息|事实|信息和事实} {谨慎|谨慎|谨慎|非常谨慎|彻底|正确} {问题|查询|关注|问题|问题}任何{增加|引用|将增加|提升|改善} {premium | top qu ality |高质量} {与您一起|与您一起|与您一起|与代理{或一个|或者一个|或可能一个|甚至一个|或者一个|或者简单地} {公司|业务|组织|公司|公司|提供者} {代表|顾问}。 {在大多数情况下|通常|通常|通常|通常|最常见},{一旦你|当你|在你之后|当你最终}提交{payment | fee | cost} {on | around新的{政策|覆盖|计划}中的|关于|关于|的内容| {你有|你有|你可能有|你可能有|你有}更新了你的。{覆盖|保护}“

注意: “注意”将被替换为“ {注意|专注于|关注|了解|熟悉} ”,而不是“知道” “,”请注意“和”意识到“最佳相关同义词。”

怎么做?谢谢你的帮助。

2 个答案:

答案 0 :(得分:1)

问题在于更换的顺序。您的代码首先发现“意识到”,然后“注意”,并在“意识到”之前替换这两个代码。如果您希望方法起作用,您可能需要重新排序$synonyms数组,以便首先找到最具体的短语,然后是最常见的(“注意“之前”意识到“)。

答案 1 :(得分:0)

如果可能,简单的答案是标记所有可能的替换,以确保它们不会重叠。

例如:

$sentence = "When it ::comes:: ::time to:: renew your ::auto:: ::insurance:: ::policy::" 

现在您需要将代码顶部更改为:

for ($i=0; $i < count($synonyms); $i++) {
   $words = explode("|", $synonyms[$i], 3);
   $searches[$i] = "::".$words[0]."::"; // Change this line
   $replaces[$i] = $synonyms[$i];
}