在数组中查找类似的字符串

时间:2015-02-25 07:29:13

标签: php arrays similarity

我需要利用similar_text()来获取看起来像这样的值数组:

$strings = ["lawyer" => 3, "business" => 3, "lawyers" => 1, "a" => 3];

我想要做的是找到几乎相同的词,即上面数组中的lawyerlawyers,并将它们的计数一起添加到一个新数组中

所以lawyer4 lawyers,因为lawyer会与1的原始字符串相关联。

请注意,此数组只会是单数字,长度未指定,范围可以从>99foreach ( $strings as $key_one => $count_one ) { foreach ( $strings as $key_two => $count_two ) { similar_text($key_two, $key_one, $percent); if ($percent > 80) { if(!isset($counts[$key_one])) { $counts[$key_one] = $count_one; } else { $counts[$key_one] += $count_two; } } } }

我不知道从哪里开始这个,所以我给你一个关于foreach循环的破解,你将在下面看到,但是预期的输出并不像预期的那样。

80

注意: 此示例的匹配百分比为lawyer(因为lawyers& ~92%的匹配为{ {1}})

最终会给我一些类似于以下内容的内容:

Array
(
    [lawyer] => 4
    [business] => 3
    [a] => 3
    [lawyers] => 2
)

我要求它:

Array
(
    [lawyer] => 4
    [business] => 3
    [a] => 3
)

请注意我是如何要求它实际删除lawyers并将计数添加到lawyer

2 个答案:

答案 0 :(得分:2)

你的困难在于律师与律师相似,律师也与律师相似。所以他们都让他们的计数被另一方搞砸了。

试试这个:

foreach ( $strings as $key_one => &$count_one ) {
    if ($count_one == 0) continue; // skip it if we've already processed it
    if (!isset($counts[$key_one]) {
        $counts[$key_one] = $count_one;
        $count_one = 0;
    }
    foreach ( $strings as $key_two => &$count_two ) {
        similar_text($key_two, $key_one, $percent);
        if ($percent > 80) {
            $counts[$key_one] += $count_two;
            $count_two = 0;
        }
    }
}

缺点是您更改了原始的$ strings数组,这可能并不理想。这是另一种方法,在另一个哈希中跟踪已处理的字符串:

$already = $counts = array(); // not really necessary, but nice to init
foreach ( $strings as $key_one => $count_one ) {
    if (isset($already[$key_one])) continue; // skip if already processed
    $counts[$key_one] = $count_one; // by definition this should be new
    foreach ( $strings as $key_two => $count_two ) {
        similar_text($key_two, $key_one, $percent);
        if ($percent > 80) {
            $counts[$key_one] += $count_two;
            $already[$key_two] = true;
        }
    }
}

我会推荐第二种解决方案。

答案 1 :(得分:1)

您可以随时使用

unset( $counts[$key_two] ) ;