查找字符串数组的公共前缀

时间:2009-08-26 17:04:04

标签: php algorithm string

我有一个这样的数组:

$sports = array(
'Softball - Counties',
'Softball - Eastern',
'Softball - North Harbour',
'Softball - South',
'Softball - Western'
);

我想找到字符串中最长的公共前缀。在这种情况下,它将是'Softball - '

我在想我会遵循这个过程

$i = 1;

// loop to the length of the first string
while ($i < strlen($sports[0]) {

  // grab the left most part up to i in length
  $match = substr($sports[0], 0, $i);

  // loop through all the values in array, and compare if they match
  foreach ($sports as $sport) {

     if ($match != substr($sport, 0, $i) {
         // didn't match, return the part that did match
         return substr($sport, 0, $i-1);
     }

  } // foreach

   // increase string length
   $i++;
} // while

// if you got to here, then all of them must be identical

问题

  1. 是否有内置函数或更简单的方法?

  2. 对于我的5行数组来说可能没什么问题,但是如果我要做几千行数组会有很多开销,所以我必须用我的起始值{{ 1}},例如$i =字符串的一半,如果失败,则$i直到它工作,然后将$i/2递增1直到我们成功。因此,我们进行最少量的比较以获得结果。

  3. 是否有针对此类问题的公式/算法?

18 个答案:

答案 0 :(得分:14)

我会用这个:

$prefix = array_shift($array);  // take the first item as initial prefix
$length = strlen($prefix);
// compare the current prefix with the prefix of the same length of the other items
foreach ($array as $item) {
    // check if there is a match; if not, decrease the prefix by one character at a time
    while ($length && substr($item, 0, $length) !== $prefix) {
        $length--;
        $prefix = substr($prefix, 0, -1);
    }
    if (!$length) {
        break;
    }
}

<强>更新 这是另一个解决方案,迭代地比较字符串的每个第n个字符,直到找到不匹配:

$pl = 0; // common prefix length
$n = count($array);
$l = strlen($array[0]);
while ($pl < $l) {
    $c = $array[0][$pl];
    for ($i=1; $i<$n; $i++) {
        if ($array[$i][$pl] !== $c) break 2;
    }
    $pl++;
}
$prefix = substr($array[0], 0, $pl);

这更有效,因为最多只有 numberOfStrings · commonPrefixLength 原子比较。

答案 1 :(得分:10)

我在代码中实现了@diogoriba算法,结果如下:

  • 找到前两个字符串的公共前缀,然后将其与从第3个字符串开始的所有后续字符串进行比较,并在未找到常见字符串时修剪公共字符串,在前缀中有更多共同点的情况下获胜不同。
  • 但是bumperbox的原始算法(错误修正除外)赢得了字符串在前缀中的共同点少于不同的地方。 代码评论中的详细信息!

我实施的另一个想法:

首先检查数组中最短的字符串,并将其用于比较而不是简单的第一个字符串。 在代码中,这是使用自定义编写函数arrayStrLenMin()实现的。

  • 可以显着降低迭代次数,但函数arrayStrLenMin()本身可能导致(或多或少)迭代。
  • 简单地从数组中第一个字符串的长度开始看起来非常笨拙,但如果arrayStrLenMin()需要多次迭代,可能会变得有效。

使用尽可能少的迭代(PHP)

获取数组中字符串的最大公共前缀

代码+广泛测试+备注:

function arrayStrLenMin ($arr, $strictMode = false, $forLoop = false) {
    $errArrZeroLength = -1; // Return value for error: Array is empty
    $errOtherType = -2;     // Return value for error: Found other type (than string in array)
    $errStrNone = -3;       // Return value for error: No strings found (in array)

    $arrLength = count($arr);
    if ($arrLength <= 0 ) { return $errArrZeroLength; }
    $cur = 0;

    foreach ($arr as $key => $val) {
        if (is_string($val)) {
            $min = strlen($val);
            $strFirstFound = $key;
            // echo("Key\tLength / Notification / Error\n");
            // echo("$key\tFound first string member at key with length: $min!\n");
            break;
        }
        else if ($strictMode) { return $errOtherType; } // At least 1 type other than string was found.
    }
    if (! isset($min)) { return $errStrNone; } // No string was found in array.

    // SpeedRatio of foreach/for is approximately 2/1 as dicussed at:
    // http://juliusbeckmann.de/blog/php-foreach-vs-while-vs-for-the-loop-battle.html

    // If $strFirstFound is found within the first 1/SpeedRatio (=0.5) of the array, "foreach" is faster!

    if (! $forLoop) {
        foreach ($arr as $key => $val) {
            if (is_string($val)) {
                $cur = strlen($val);
                // echo("$key\t$cur\n");
                if ($cur == 0) { return $cur; } // 0 is the shortest possible string, so we can abort here.
                if ($cur < $min) { $min = $cur; }
            }
        // else { echo("$key\tNo string!\n"); }
        }
    }

    // If $strFirstFound is found after the first 1/SpeedRatio (=0.5) of the array, "for" is faster!

    else {
        for ($i = $strFirstFound + 1; $i < $arrLength; $i++) {
            if (is_string($arr[$i])) {
                $cur = strlen($arr[$i]);
                // echo("$i\t$cur\n");
                if ($cur == 0) { return $cur; } // 0 is the shortest possible string, so we can abort here.
                if ($cur < $min) { $min = $cur; }
            }
            // else { echo("$i\tNo string!\n"); }
        }
    }

    return $min;
}

function strCommonPrefixByStr($arr, $strFindShortestFirst = false) {
    $arrLength = count($arr);
    if ($arrLength < 2) { return false; }

    // Determine loop length
    /// Find shortest string in array: Can bring down iterations dramatically, but the function arrayStrLenMin() itself can cause ( more or less) iterations.
    if ($strFindShortestFirst) { $end = arrayStrLenMin($arr, true); }
    /// Simply start with length of first string in array: Seems quite clumsy, but may turn out effective, if arrayStrLenMin() needs many iterations.
    else { $end = strlen($arr[0]); }

    for ($i = 1; $i <= $end + 1; $i++) {
        // Grab the part from 0 up to $i
        $commonStrMax = substr($arr[0], 0, $i);
        echo("Match: $i\t$commonStrMax\n");
        // Loop through all the values in array, and compare if they match
        foreach ($arr as $key => $str) {
            echo("  Str: $key\t$str\n");
            // Didn't match, return the part that did match
            if ($commonStrMax != substr($str, 0, $i)) {
                    return substr($commonStrMax, 0, $i-1);
            }
        }
    }
    // Special case: No mismatch (hence no return) happened until loop end!
    return $commonStrMax; // Thus entire first common string is the common prefix!
}

function strCommonPrefixByChar($arr, $strFindShortestFirst = false) {
    $arrLength = count($arr);
    if ($arrLength < 2) { return false; }

    // Determine loop length
    /// Find shortest string in array: Can bring down iterations dramatically, but the function arrayStrLenMin() itself can cause ( more or less) iterations.
    if ($strFindShortestFirst) { $end = arrayStrLenMin($arr, true); }
    /// Simply start with length of first string in array: Seems quite clumsy, but may turn out effective, if arrayStrLenMin() needs many iterations.
    else { $end = strlen($arr[0]); }

    for ($i = 0 ; $i <= $end + 1; $i++) {
        // Grab char $i
        $char = substr($arr[0], $i, 1);
        echo("Match: $i\t"); echo(str_pad($char, $i+1, " ", STR_PAD_LEFT)); echo("\n");
        // Loop through all the values in array, and compare if they match
        foreach ($arr as $key => $str) {
            echo("  Str: $key\t$str\n");
            // Didn't match, return the part that did match
            if ($char != $str[$i]) { // Same functionality as ($char != substr($str, $i, 1)). Same efficiency?
                    return substr($arr[0], 0, $i);
            }
        }
    }
    // Special case: No mismatch (hence no return) happened until loop end!
    return substr($arr[0], 0, $end); // Thus entire first common string is the common prefix!
}


function strCommonPrefixByNeighbour($arr) {
    $arrLength = count($arr);
    if ($arrLength < 2) { return false; }

    /// Get the common string prefix of the first 2 strings
    $strCommonMax = strCommonPrefixByChar(array($arr[0], $arr[1]));
    if ($strCommonMax === false) { return false; }
    if ($strCommonMax == "") { return ""; }
    $strCommonMaxLength = strlen($strCommonMax);

    /// Now start looping from the 3rd string
    echo("-----\n");
    for ($i = 2; ($i < $arrLength) && ($strCommonMaxLength >= 1); $i++ ) {
        echo("  STR: $i\t{$arr[$i]}\n");

        /// Compare the maximum common string with the next neighbour

        /*
        //// Compare by char: Method unsuitable!

        // Iterate from string end to string beginning
        for ($ii = $strCommonMaxLength - 1; $ii >= 0; $ii--) {
            echo("Match: $ii\t"); echo(str_pad($arr[$i][$ii], $ii+1, " ", STR_PAD_LEFT)); echo("\n");
            // If you find the first mismatch from the end, break.
            if ($arr[$i][$ii] != $strCommonMax[$ii]) {
                $strCommonMaxLength = $ii - 1; break;
                // BUT!!! We may falsely assume that the string from the first mismatch until the begining match! This new string neighbour string is completely "unexplored land", there might be differing chars closer to the beginning. This method is not suitable. Better use string comparison than char comparison.
            }
        }
        */

        //// Compare by string

        for ($ii = $strCommonMaxLength; $ii > 0; $ii--) {
            echo("MATCH: $ii\t$strCommonMax\n");
            if (substr($arr[$i],0,$ii) == $strCommonMax) {
                break;
            }
            else {
                $strCommonMax = substr($strCommonMax,0,$ii - 1);
                $strCommonMaxLength--;
            }
        }
    }
    return substr($arr[0], 0, $strCommonMaxLength);
}





// Tests for finding the common prefix

/// Scenarios

$filesLeastInCommon = array (
"/Vol/1/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/a/1",
"/Vol/2/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/a/2",
"/Vol/1/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/b/1",
"/Vol/1/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/b/2",
"/Vol/2/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/b/c/1",
"/Vol/2/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/a/1",
);

$filesLessInCommon = array (
"/Vol/1/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/a/1",
"/Vol/1/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/a/2",
"/Vol/1/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/b/1",
"/Vol/1/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/b/2",
"/Vol/2/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/b/c/1",
"/Vol/2/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/a/1",
);

$filesMoreInCommon = array (
"/Voluuuuuuuuuuuuuumes/1/a/a/1",
"/Voluuuuuuuuuuuuuumes/1/a/a/2",
"/Voluuuuuuuuuuuuuumes/1/a/b/1",
"/Voluuuuuuuuuuuuuumes/1/a/b/2",
"/Voluuuuuuuuuuuuuumes/2/a/b/c/1",
"/Voluuuuuuuuuuuuuumes/2/a/a/1",
);

$sameDir = array (
"/Volumes/1/a/a/",
"/Volumes/1/a/a/aaaaa/2",
);

$sameFile = array (
"/Volumes/1/a/a/1",
"/Volumes/1/a/a/1",
);

$noCommonPrefix = array (
"/Volumes/1/a/a/",
"/Volumes/1/a/a/aaaaa/2",
"Net/1/a/a/aaaaa/2",
);

$longestLast = array (
"/Volumes/1/a/a/1",
"/Volumes/1/a/a/aaaaa/2",
);

$longestFirst = array (
"/Volumes/1/a/a/aaaaa/1",
"/Volumes/1/a/a/2",
);

$one = array ("/Volumes/1/a/a/aaaaa/1");

$empty = array ( );


// Test Results for finding  the common prefix

/*

I tested my functions in many possible scenarios.
The results, the common prefixes, were always correct in all scenarios!
Just try a function call with your individual array!

Considering iteration efficiency, I also performed tests:

I put echo functions into the functions where iterations occur, and measured the number of CLI line output via:
php <script with strCommonPrefixByStr or strCommonPrefixByChar> | egrep "^  Str:" | wc -l   GIVES TOTAL ITERATION SUM.
php <Script with strCommonPrefixByNeighbour> | egrep "^  Str:" | wc -l   PLUS   | egrep "^MATCH:" | wc -l   GIVES TOTAL ITERATION SUM.

My hypothesis was proven:
strCommonPrefixByChar wins in situations where the strings have less in common in their beginning (=prefix).
strCommonPrefixByNeighbour wins where there is more in common in the prefixes.

*/

// Test Results Table
// Used Functions | Iteration amount | Remarks

// $result = (strCommonPrefixByStr($filesLessInCommon)); // 35
// $result = (strCommonPrefixByChar($filesLessInCommon)); // 35 // Same amount of iterations, but much fewer characters compared because ByChar instead of ByString!
// $result = (strCommonPrefixByNeighbour($filesLessInCommon)); // 88 + 42 = 130 // Loses in this category!

// $result = (strCommonPrefixByStr($filesMoreInCommon)); // 137
// $result = (strCommonPrefixByChar($filesMoreInCommon)); // 137 // Same amount of iterations, but much fewer characters compared because ByChar instead of ByString!
// $result = (strCommonPrefixByNeighbour($filesLeastInCommon)); // 12 + 4 = 16 // Far the winner in this category!

echo("Common prefix of all members:\n");
var_dump($result);





// Tests for finding the shortest string in array

/// Arrays

// $empty = array ();
// $noStrings = array (0,1,2,3.0001,4,false,true,77);
// $stringsOnly = array ("one","two","three","four");
// $mixed = array (0,1,2,3.0001,"four",false,true,"seven", 8888);

/// Scenarios

// I list them from fewest to most iterations, which is not necessarily equivalent to slowest to fastest!
// For speed consider the remarks in the code considering the Speed ratio of foreach/for!

//// Fewest iterations (immediate abort on "Found other type", use "for" loop)

// foreach( array($empty, $noStrings, $stringsOnly, $mixed) as $arr) {
//  echo("NEW ANALYSIS:\n");
//  echo("Result: " . arrayStrLenMin($arr, true, true) . "\n\n");
// }

/* Results:

    NEW ANALYSIS:
    Result: Array is empty!

    NEW ANALYSIS:
    Result: Found other type!

    NEW ANALYSIS:
    Key Length / Notification / Error
    0   Found first string member at key with length: 3!
    1   3
    2   5
    3   4
    Result: 3

    NEW ANALYSIS:
    Result: Found other type!

*/

//// Fewer iterations (immediate abort on "Found other type", use "foreach" loop)

// foreach( array($empty, $noStrings, $stringsOnly, $mixed) as $arr) {
//  echo("NEW ANALYSIS:\n");
//  echo("Result: " . arrayStrLenMin($arr, true, false) . "\n\n");
// }

/* Results:

    NEW ANALYSIS:
    Result: Array is empty!

    NEW ANALYSIS:
    Result: Found other type!

    NEW ANALYSIS:
    Key Length / Notification / Error
    0   Found first string member at key with length: 3!
    0   3
    1   3
    2   5
    3   4
    Result: 3

    NEW ANALYSIS:
    Result: Found other type!

*/

//// More iterations (No immediate abort on "Found other type", use "for" loop)

// foreach( array($empty, $noStrings, $stringsOnly, $mixed) as $arr) {
//  echo("NEW ANALYSIS:\n");
//  echo("Result: " . arrayStrLenMin($arr, false, true) . "\n\n");
// }

/* Results:

    NEW ANALYSIS:
    Result: Array is empty!

    NEW ANALYSIS:
    Result: No strings found!

    NEW ANALYSIS:
    Key Length / Notification / Error
    0   Found first string member at key with length: 3!
    1   3
    2   5
    3   4
    Result: 3

    NEW ANALYSIS:
    Key Length / Notification / Error
    4   Found first string member at key with length: 4!
    5   No string!
    6   No string!
    7   5
    8   No string!
    Result: 4

*/


//// Most iterations (No immediate abort on "Found other type", use "foreach" loop)

// foreach( array($empty, $noStrings, $stringsOnly, $mixed) as $arr) {
//  echo("NEW ANALYSIS:\n");
//  echo("Result: " . arrayStrLenMin($arr, false, false) . "\n\n");
// }

/* Results:

    NEW ANALYSIS:
    Result: Array is empty!

    NEW ANALYSIS:
    Result: No strings found!

    NEW ANALYSIS:
    Key Length / Notification / Error
    0   Found first string member at key with length: 3!
    0   3
    1   3
    2   5
    3   4
    Result: 3

    NEW ANALYSIS:
    Key Length / Notification / Error
    4   Found first string member at key with length: 4!
    0   No string!
    1   No string!
    2   No string!
    3   No string!
    4   4
    5   No string!
    6   No string!
    7   5
    8   No string!
    Result: 4

*/

答案 2 :(得分:9)

如果你可以对阵列进行排序,那么就有一个简单而快速的解决方案。

只需比较第一项和最后一项。

如果字符串已排序,则所有字符串共有的任何前缀对于排序的第一个和最后一个字符串都是通用的。

sort($sport);

$s1 = $sport[0];               // First string
$s2 = $sport[count($sport)-1]; // Last string
$len = min(strlen($s1), strlen($s2));

// While we still have string to compare,
// if the indexed character is the same in both strings,
// increment the index. 
for ($i=0; $i<$len && $s1[$i]==$s2[$i]; $i++); 

$prefix = substr($s1, 0, $i);

答案 3 :(得分:7)

我认为你的方式正确。但是,当所有字符串都通过时,不是递增i,而是可以这样做:

1)比较数组中的前两个字符串,找出它们有多少常见字符。例如,将公共字符保存在名为maxCommon的单独字符串中。

2)比较第三个字符串w / maxCommon。如果常用字符数较小,请将maxCommon修剪为匹配的字符。

3)重复并冲洗阵列的其余部分。在该过程结束时,maxCommon将具有所有数组元素共有的字符串。

这会增加一些开销,因为你需要比较每个字符串w / maxCommon,但会大大减少你获得结果所需的迭代次数。

答案 4 :(得分:7)

对于这个问题,可能会有一些非常受欢迎的算法,但是如果你知道你的共性会像你的例子一样在左侧,你可能会做得比你发布的方法首先找到前两个字符串的共性,然后迭代列表的其余部分,根据需要修剪公共字符串以实现通用性,或者如果你一直修剪为无效则终止失败。

答案 5 :(得分:3)

我认为“共同部分”是指“最长公共前缀”。这比任何常见的子串都要简单得多。

如果不在最坏的​​情况下阅读(n+1) * m个字符,并且在最佳情况下n * m + 1n是最长公共前缀和m的长度,则无法完成此操作是字符串的数量。

一次比较一个字母可以达到效率(Big Theta(n * m))。

你提出的算法在Big Theta(n ^ 2 * m)中运行,对于大输入来说这要慢得多。

第三个提出的找到前两个字符串中最长前缀的算法,然后将其与第三个,第四个字符串等进行比较,在Big Theta(n * m)中也有一个运行时间,但具有更高的常数因子。在实践中它可能只会稍微慢一些。

总的来说,我建议只滚动你自己的功能,因为第一个算法太慢了,而另外两个算法的编写也差不多。

查看WikiPedia以获取有关Big Theta表示法的说明。

答案 6 :(得分:2)

这是JavaScript中优雅的递归实现:

function prefix(strings) {
    switch (strings.length) {

      case 0:
        return "";

      case 1:
        return strings[0];

      case 2:
        // compute the prefix between the two strings
        var a = strings[0],
            b = strings[1],
            n = Math.min(a.length, b.length),
            i = 0;
        while (i < n && a.charAt(i) === b.charAt(i))
            ++i;
        return a.substring(0, i);

      default:
        // return the common prefix of the first string,
        // and the common prefix of the rest of the strings
        return prefix([ strings[0], prefix(strings.slice(1)) ]);
    }
}

答案 7 :(得分:1)

  1. 不是我知道的

  2. 是:不是比较从0到长度为i的子字符串,而是只需检查第i个字符(您已经知道字符0到i-1匹配)。

答案 8 :(得分:1)

简短而甜蜜的版本,也许不是最有效的:

/// Return length of longest common prefix in an array of strings.
function _commonPrefix($array) {
    if(count($array) < 2) {
        if(count($array) == 0)
            return false; // empty array: undefined prefix
        else
            return strlen($array[0]); // 1 element: trivial case
    }
    $len = max(array_map('strlen',$array)); // initial upper limit: max length of all strings.
    $prevval = reset($array);
    while(($newval = next($array)) !== FALSE) {
        for($j = 0 ; $j < $len ; $j += 1)
            if($newval[$j] != $prevval[$j])
                $len = $j;
        $prevval = $newval;
    }
    return $len;
}

// TEST CASE:
$arr = array('/var/yam/yamyam/','/var/yam/bloorg','/var/yar/sdoo');
print_r($arr);
$plen = _commonprefix($arr);
$pstr = substr($arr[0],0,$plen);
echo "Res: $plen\n";
echo "==> ".$pstr."\n";
echo "dir: ".dirname($pstr.'aaaa')."\n";

测试用例的输出:

Array
(
    [0] => /var/yam/yamyam/
    [1] => /var/yam/bloorg
    [2] => /var/yar/sdoo
)
Res: 7
==> /var/ya
dir: /var

答案 9 :(得分:0)

分享一个针对这个问题的 Typescript 解决方案。我把它分成 2 个方法,只是为了保持它的清洁。

function longestCommonPrefix(strs: string[]): string {
    let output = '';
    if(strs.length > 0) {
        output = strs[0];
        if(strs.length > 1) {
            for(let i=1; i <strs.length; i++) {
                output = checkCommonPrefix(output, strs[i]);
            }
        }
    }  
    return output;
};
    
function checkCommonPrefix(str1: string, str2: string): string {
    let output = '';
    let len = Math.min(str1.length, str2.length);
    let i = 0;
    while(i < len) {
        if(str1[i] === str2[i]) {
            output += str1[i];
        } else {
            i = len;
        }
        i++;
    }
    return output;
}

答案 10 :(得分:0)

这是@Gumbo答案的补充。如果要确保所选的公共前缀不会破坏单词,请使用此选项。我只是在所选字符串的末尾找一个空格。如果存在,我们知道所有短语都有更多,所以我们将其截断。

function product_name_intersection($array){

    $pl = 0; // common prefix length
    $n = count($array);
    $l = strlen($array[0]);
    $first = current($array);

    while ($pl < $l) {
        $c = $array[0][$pl];
        for ($i=1; $i<$n; $i++) {
            if (!isset($array[$i][$pl]) || $array[$i][$pl] !== $c) break 2;
        }
        $pl++;
    }
    $prefix = substr($array[0], 0, $pl);

    if ($pl < strlen($first) && substr($prefix, -1, 1) != ' ') {

        $prefix = preg_replace('/\W\w+\s*(\W*)$/', '$1', $prefix);
    }

    $prefix =  preg_replace('/^\W*(.+?)\W*$/', '$1', $prefix);

    return $prefix;
}

答案 11 :(得分:0)

对于它的价值,这是我提出的另一种选择。

我用它来查找产品代码列表的公共前缀(即,有多个产品SKU在开头有一个共同的一系列字符):

/**
 * Try to find a common prefix for a list of strings
 * 
 * @param array $strings
 * @return string
 */
function findCommonPrefix(array $strings)
{
    $prefix = '';
    $chars = array_map("str_split", $strings);
    $matches = call_user_func_array("array_intersect_assoc", $chars);
    if ($matches) {
        $i = 0;
        foreach ($matches as $key => $value) {
            if ($key != $i) {
                unset($matches[$key]);
            }
            $i++;
        }
        $prefix = join('', $matches);
    }

    return $prefix;
}

答案 12 :(得分:0)

最佳答案似乎有点长,所以这是一个简洁的解决方案,运行时为O(n 2 )。

function findLongestPrefix($arr) {
  return array_reduce($arr, function($prefix, $item) {
    $length = min(strlen($prefix), strlen($item));
    while (substr($prefix, 0, $length) !== substr($item, 0, $length)) {
      $length--;
    }
    return substr($prefix, 0, $length);
  }, $arr[0]);
}

print findLongestPrefix($sports); // Softball -

答案 13 :(得分:0)

此处的解决方案仅用于在字符串开头查找共性。这是一个在字符串数组中查找最长公共子串 where 的函数。

http://www.christopherbloom.com/2011/02/24/find-the-longest-common-substring-using-php/

答案 14 :(得分:0)


    // Common prefix
    $common = '';

    $sports = array(
    'Softball T - Counties',
    'Softball T - Eastern',
    'Softball T - North Harbour',
    'Softball T - South',
    'Softball T - Western'
    );

    // find mini string
    $minLen = strlen($sports[0]);
    foreach ($sports as $s){
        if($minLen > strlen($s))
            $minLen = strlen($s);
    }


    // flag to break out of inner loop
    $flag = false;

    // The possible common string length does not exceed the minimum string length.
    // The following solution is O(n^2), this can be improve.
    for ($i = 0 ; $i < $minLen; $i++){
        $tmp = $sports[0][$i];

        foreach ($sports as $s){
            if($s[$i] != $tmp)
                $flag = true;
        }
        if($flag)
            break;
        else
            $common .= $sports[0][$i];
    }

    print $common;

答案 15 :(得分:0)

我会使用这样的递归算法:

1 - 获取数组中的第一个字符串 2 - 使用第一个字符串作为参数调用递归前缀方法 3 - 如果前缀为空,则不返回前缀 4 - 循环遍历数组中的所有字符串 4.1 - 如果任何字符串不以前缀开头 4.1.1 - 使用前缀-1作为参数调用递归前缀方法 4.2返回前缀

答案 16 :(得分:0)

这样的事情怎么样?如果我们可以使用null终止字符,那么可以通过不必检查字符串的长度来进一步优化(但我假设python字符串的长度在某处缓存了吗?)

def find_common_prefix_len(strings):
    """
    Given a list of strings, finds the length common prefix in all of them.
    So
    apple
    applet
    application
    would return 3
    """
    prefix          = 0
    curr_index      = -1
    num_strings     = len(strings)
    string_lengths  = [len(s) for s in strings]
    while True:
        curr_index  += 1
        ch_in_si    = None
        for si in xrange(0, num_strings):
            if curr_index >= string_lengths[si]:
                return prefix
            else:
                if si == 0:
                    ch_in_si = strings[0][curr_index]
                elif strings[si][curr_index] != ch_in_si:
                    return prefix
        prefix += 1

答案 17 :(得分:0)

<强> @bumperbox

  1. 您的基本代码需要进行一些修正以适用于所有情况!

    • 你的循环只比较最后一个字符前的一个字符!
    • 不匹配可能发生在最新公共字符后的1个循环周期。
    • 因此,您必须至少检查第一个字符串的最后一个字符后的1个字符。
    • 因此,您的比较运算符必须为“&lt; = 1”或“&lt; 2”。
  2. 目前您的算法失败

    • 如果第一个字符串完全包含在所有其他字符串中,
    • 或完全包含在除最后一个字符之外的所有其他字符串中。
  3. 在我的下一篇答案/帖子中,我将附上迭代优化代码!

    原始Bumperbox代码PLUS更正(PHP):

    function shortest($sports) {
     $i = 1;
    
     // loop to the length of the first string
     while ($i < strlen($sports[0])) {
    
      // grab the left most part up to i in length
      // REMARK: Culturally biased towards LTR writing systems. Better say: Grab frombeginning...
      $match = substr($sports[0], 0, $i);
    
      // loop through all the values in array, and compare if they match
      foreach ($sports as $sport) {
       if ($match != substr($sport, 0, $i)) {
        // didn't match, return the part that did match
        return substr($sport, 0, $i-1);
       }
      }
     $i++; // increase string length
     }
    }
    
    function shortestCorrect($sports) {
     $i = 1;
     while ($i <= strlen($sports[0]) + 1) {
      // Grab the string from its beginning with length $i
      $match = substr($sports[0], 0, $i);
      foreach ($sports as $sport) {
       if ($match != substr($sport, 0, $i)) {
        return substr($sport, 0, $i-1);
       }
      }
      $i++;
     }
     // Special case: No mismatch happened until loop end! Thus entire str1 is common prefix!
     return $sports[0];
    }
    
    $sports1 = array(
    'Softball',
    'Softball - Eastern',
    'Softball - North Harbour');
    
    $sports2 = array(
    'Softball - Wester',
    'Softball - Western',
    );
    
    $sports3 = array(
    'Softball - Western',
    'Softball - Western',
    );
    
    $sports4 = array(
    'Softball - Westerner',
    'Softball - Western',
    );
    
    echo("Output of the original function:\n"); // Failure scenarios
    
    var_dump(shortest($sports1)); // NULL rather than the correct 'Softball'
    var_dump(shortest($sports2)); // NULL rather than the correct 'Softball - Wester'
    var_dump(shortest($sports3)); // NULL rather than the correct 'Softball - Western'
    var_dump(shortest($sports4)); // Only works if the second string is at least one character longer!
    
    echo("\nOutput of the corrected function:\n"); // All scenarios work
    var_dump(shortestCorrect($sports1));
    var_dump(shortestCorrect($sports2));
    var_dump(shortestCorrect($sports3));
    var_dump(shortestCorrect($sports4));