在PHP中查找两个字符串的匹配部分

时间:2010-11-25 23:39:23

标签: php uri string-matching

我正在寻找一种简单的方法来在PHP中找到两个字符串的匹配部分(特别是在URI的上下文中)

例如,考虑两个字符串:

http://2.2.2.2/~machinehost/deployment_folder/

/〜machinehost / deployment_folder /用户/鲍勃/设置

我需要的是从第二个字符串中删除这两个字符串的匹配部分,结果是:

用户/鲍勃/设置

在将第一个字符串作为前缀附加之前,形成绝对URI。

是否有一些简单的方法(在PHP中)比较两个任意字符串以匹配其中的子字符串?

编辑:正如所指出的,我的意思是两个字符串共有的最长匹配子字符串

6 个答案:

答案 0 :(得分:4)

This就是答案。即用型PHP功能。

答案 1 :(得分:2)

假设您的字符串分别为$a$b,您可以使用此字符:

$a = 'http://2.2.2.2/~machinehost/deployment_folder/';
$b = '/~machinehost/deployment_folder/users/bob/settings';

$len_a = strlen($a);
$len_b = strlen($b);

for ($p = max(0, $len_a - $len_b); $p < $len_b; $p++)
    if (substr($a, $len_a - ($len_b - $p)) == substr($b, 0, $len_b - $p))
        break;

$result = $a.substr($b, $len_b - $p);

echo $result;

此结果为http://2.2.2.2/~machinehost/deployment_folder/users/bob/settings

答案 2 :(得分:1)

使用正则表达式也可以找到最长的共同匹配。

下面的函数将使用两个字符串,使用一个来创建一个正则表达式,然后针对另一个执行它。

/**
 * Determine the longest common match within two strings
 *
 * @param string $str1
 * @param string $str2 Two strings in any order.
 * @param boolean $case_sensitive Set to true to force
 * case sensitivity. Default: false (case insensitive).
 * @return string The longest string - first match.
 */
function get_longest_common_subsequence( $str1, $str2, $case_sensitive = false ) {
    // First check to see if one string is the same as the other.
    if ( $str1 === $str2 ) return $str1;
    if ( ! $case_sensitive && strtolower( $str1 ) === strtolower( $str2 ) ) return $str1;

    // We'll use '#' as our regex delimiter. Any character can be used as we'll quote the string anyway,
    $delimiter = '#';

    // We'll find the shortest string and use that to check substrings and create our regex.
    $l1 = strlen( $str1 );
    $l2 = strlen( $str2 );
    $str = $l1 <= $l2 ? $str1 : $str2;
    $str2 = $l1 <= $l2 ? $str2 : $str1;
    $l = min( $l1, $l2 );

    // Next check to see if one string is a substring of the other.
    if ( $case_sensitive ) {
        if ( strpos( $str2, $str ) !== false ) {
            return $str;
        }
    }
    else {
        if ( stripos( $str2, $str ) !== false ) {
            return $str;
        }
    }

    // Regex for each character will be of the format (?:a(?=b))?
    // We also need to capture the last character, but this prevents us from matching strings with a single character. (?:.|c)?
    $reg = $delimiter;
    for ( $i = 0; $i < $l; $i++ ) {
        $a = preg_quote( $str[ $i ], $delimiter );
        $b = $i + 1 < $l ? preg_quote( $str[ $i + 1 ], $delimiter ) : false;
        $reg .= sprintf( $b !== false ? '(?:%s(?=%s))?' : '(?:.|%s)?', $a, $b );
    }
    $reg .= $delimiter;
    if ( ! $case_sensitive ) {
        $reg .= 'i';
    }
    // Resulting example regex from a string 'abbc':
    // '#(?:a(?=b))?(?:b(?=b))?(?:b(?=c))?(?:.|c)?#i';

    // Perform our regex on the remaining string
    $str = $l1 <= $l2 ? $str2 : $str1;
    if ( preg_match_all( $reg, $str, $matches ) ) {
        // $matches is an array with a single array with all the matches.
        return array_reduce( $matches[0], function( $a, $b ) {
            $al = strlen( $a );
            $bl = strlen( $b );
            // Return the longest string, as long as it's not a single character.
            return $al >= $bl || $bl <= 1 ? $a : $b;
        }, '' );
    }

    // No match - Return an empty string.
    return '';
}

它会使用两个字符串中较短的一个来生成一个正则表达式,尽管性能很可能是相同的。它可能会错误地将字符串与重复的子字符串匹配,并且我们仅限于匹配两个或更多字符的字符串,除非它们相等或一个是另一个的子字符串。对于实例:

// Works as intended.
get_longest_common_subsequence( 'abbc', 'abc' ) === 'ab';

// Returns incorrect substring based on string length and recurring substrings.
get_longest_common_subsequence( 'abbc', 'abcdef' ) === 'abc';

// Does not return any matches, as all recurring strings are only a single character long.
get_longest_common_subsequence( 'abc', 'ace' ) === '';

// One of the strings is a substring of the other.
get_longest_common_subsequence( 'abc', 'a' ) === 'a';

无论如何,它使用替代方法运行,并且可以改进正则表达式以解决其他情况。

答案 3 :(得分:0)

我不确定理解你的全部要求,但想法是:

设A为您的URL,B为“/〜machinehost / deployment_folder / users / bob / settings”

  • 在A中搜索B - >你得到一个索引i(其中i是A中第一个/ B的位置)
  • 让l =长度(A)
  • 您需要将B从(l-i)切换到长度(B)以获取B的最后部分(/ users / bob / settings)

我还没有测试过,但如果你真的需要,我可以帮助你使这个出色(讽刺)的解决方案有效。

请注意,可以使用像

这样的正则表达式
$pattern = "$B(.*?)"
$res = array();
preg_match_all($pattern, $A, $res);

编辑:我认为您的上一条评论会使我的回复无效。但你想要的是找到子串。所以你可以先用一个繁重的算法开始尝试在{2,长度(B)}中找到A中的B [1:i],然后使用一些dynamic programming个东西。

答案 4 :(得分:0)

根据您的要求,它似乎不是一个开箱即用的代码。所以让我们寻找一种简单的方法。

在本次练习中,我使用了两种方法,一种用于找到最长的匹配,另一种用于切断匹配部分。

FindLongestMatch()方法,拆分路径,逐个寻找其他路径中的匹配,只保留一个匹配,最长的匹配(没有数组,没有排序)。 RemoveLongestMatch()方法在找到最长匹配位置后采用后缀或“余数”。

这里是完整的源代码:

<?php

function FindLongestMatch($relativePath, $absolutePath)
{
    static $_separator = '/';
    $splitted = array_reverse(explode($_separator, $absolutePath));

    foreach ($splitted as &$value)
    {
        $matchTest = $value.$_separator.$match;
        if(IsSubstring($relativePath, $matchTest))
            $match = $matchTest;

        if (!empty($value) && IsNewMatchLonger($match, $longestMatch))
            $longestMatch = $match;
    }

    return $longestMatch;
}

//Removes from the first string the longest match.
function RemoveLongestMatch($relativePath, $absolutePath)
{
    $match = findLongestMatch($relativePath, $absolutePath);
    $positionFound = strpos($relativePath, $match);     
    $suffix = substr($relativePath, $positionFound + strlen($match));

    return $suffix;
}

function IsNewMatchLonger($match, $longestMatch)
{
    return strlen($match) > strlen($longestMatch);
}

function IsSubstring($string, $subString)
{
    return strpos($string, $subString) > 0;
}

这是测试用例的代表性子集:

//TEST CASES
echo "<br>-----------------------------------------------------------"; 
echo "<br>".$absolutePath = 'http://2.2.2.2/~machinehost/deployment_folder/';
echo "<br>".$relativePath = '/~machinehost/deployment_folder/users/bob/settings';
echo "<br>Longest match: ".findLongestMatch($relativePath, $absolutePath);
echo "<br>Suffix: ".removeLongestMatch($relativePath, $absolutePath);

echo "<br>-----------------------------------------------------------"; 
echo "<br>".$absolutePath = 'http://1.1.1.1/root/~machinehost/deployment_folder/';
echo "<br>".$relativePath = '/root/~machinehost/deployment_folder/users/bob/settings';
echo "<br>Longest match: ".findLongestMatch($relativePath, $absolutePath);
echo "<br>Suffix: ".removeLongestMatch($relativePath, $absolutePath);

echo "<br>-----------------------------------------------------------"; 
echo "<br>".$absolutePath = 'http://2.2.2.2/~machinehost/deployment_folder/users/';
echo "<br>".$relativePath = '/~machinehost/deployment_folder/users/bob/settings';
echo "<br>Longest match: ".findLongestMatch($relativePath, $absolutePath);
echo "<br>Suffix: ".removeLongestMatch($relativePath, $absolutePath);

echo "<br>-----------------------------------------------------------"; 
echo "<br>".$absolutePath = 'http://3.3.3.3/~machinehost/~machinehost/subDirectory/deployment_folder/';
echo "<br>".$relativePath = '/~machinehost/subDirectory/deployment_folderX/users/bob/settings';
echo "<br>Longest match: ".findLongestMatch($relativePath, $absolutePath);
echo "<br>Suffix: ".removeLongestMatch($relativePath, $absolutePath);

运行以前的测试用例提供以下输出:

http://2.2.2.2/~machinehost/deployment_folder/
/~machinehost/deployment_folder/users/bob/settings
Longuest match: ~machinehost/deployment_folder/
Suffix: users/bob/settings

http://1.1.1.1/root/~machinehost/deployment_folder/
/root/~machinehost/deployment_folder/users/bob/settings
Longuest match: root/~machinehost/deployment_folder/
Suffix: users/bob/settings

http://2.2.2.2/~machinehost/deployment_folder/users/
/~machinehost/deployment_folder/users/bob/settings
Longuest match: ~machinehost/deployment_folder/users/
Suffix: bob/settings

http://3.3.3.3/~machinehost/~machinehost/subDirectory/deployment_folder/
/~machinehost/subDirectory/deployment_folderX/users/bob/settings
Longuest match: ~machinehost/subDirectory/
Suffix: deployment_folderX/users/bob/settings

也许您可以理解这段代码,并将其转化为您认为对当前项目有用的内容。 让我知道它是否也适合你。顺便说一句,oreX先生的答案看起来也不错。

答案 5 :(得分:-1)