PHP - 如何在数组中找到重复的值分组

时间:2014-01-22 22:29:38

标签: php arrays pattern-matching

我有一个字符串值数组,有时会形成重复值模式('a','b','c','d')

$array = array(
    'a', 'b', 'c', 'd',
    'a', 'b', 'c', 'd',
    'c', 'd',
);

我想根据数组顺序找到重复的模式,并按相同的顺序对它们进行分组(以维护它)。

$patterns = array(
    array('number' => 2, 'values' => array('a', 'b', 'c', 'd')),
    array('number' => 1, 'values' => array('c'))
    array('number' => 1, 'values' => array('d'))
);

请注意[a,b],[b,c],& [c,d]本身不是模式,因为它们位于较大的[a,b,c,d]模式中,最后的[c,d]模式只出现一次所以它也不是模式 - 只是个别值' c'和'd'

另一个例子:

$array = array(
    'x', 'x', 'y', 'x', 'b', 'x', 'b', 'a'
  //[.......] [.] [[......]  [......]] [.]
);

产生

$patterns = array(
    array('number' => 2, 'values' => array('x')),
    array('number' => 1, 'values' => array('y')),
    array('number' => 2, 'values' => array('x', 'b')),
    array('number' => 1, 'values' => array('a'))
);

我该怎么做?

10 个答案:

答案 0 :(得分:7)

字符数组只是字符串。正则表达式是字符串模式匹配的王者。添加递归,解决方案非常优雅,即使在字符数组中来回转换:

function findPattern($str){
    $results = array();
    if(is_array($str)){
        $str = implode($str);
    }
    if(strlen($str) == 0){ //reached the end
        return $results;
    }
    if(preg_match_all('/^(.+)\1+(.*?)$/',$str,$matches)){ //pattern found
        $results[] = array('number' => (strlen($str) - strlen($matches[2][0])) / strlen($matches[1][0]), 'values' => str_split($matches[1][0]));
        return array_merge($results,findPattern($matches[2][0]));
    }
    //no pattern found
    $results[] = array('number' => 1, 'values' => array(substr($str, 0, 1)));
    return array_merge($results,findPattern(substr($str, 1)));
}

您可以在此处测试:https://eval.in/507818https://eval.in/507815

答案 1 :(得分:5)

如果c和d可以分组,这是我的代码:

<?php
$array = array(
    'a', 'b', 'c', 'd',
    'a', 'b', 'c', 'd',
    'c', 'd',
);

$res = array();

foreach ($array AS $value) {
    if (!isset($res[$value])) {
        $res[$value] = 0;
    }
    $res[$value]++;
}

foreach ($res AS $key => $value) {
    $fArray[$value][] = $key;
    for ($i = $value - 1; $i > 0; $i--) {
        $fArray[$i][] = $key;
    }
}

$res = array();
foreach($fArray AS $key => $value) {
    if (!isset($res[serialize($value)])) {
        $res[serialize($value)] = 0;
    }
    $res[serialize($value)]++;
}
$fArray = array();
foreach($res AS $key => $value) {
    $fArray[] = array('number' => $value, 'values' => unserialize($key));
}

echo '<pre>';
var_dump($fArray);
echo '</pre>';

最终结果是:

array (size=2)
  0 => 
    array (size=2)
      'number' => int 2
      'values' => 
        array (size=4)
          0 => string 'a' (length=1)
          1 => string 'b' (length=1)
          2 => string 'c' (length=1)
          3 => string 'd' (length=1)
  1 => 
    array (size=2)
      'number' => int 1
      'values' => 
        array (size=2)
          0 => string 'c' (length=1)
          1 => string 'd' (length=1)

答案 2 :(得分:5)

以下代码将返回预期结果,找到具有重复值的最长部分:

function pepito($array) {
  $sz=count($array);
  $patterns=Array();
  for ($pos=0;$pos<$sz;$pos+=$len) {
    $nb=1;
    for ($len=floor($sz/2);$len>0;$len--) {
      while (array_slice($array, $pos, $len)==array_slice($array, $pos+$len, $len)) {
        $pos+=$len;
        $nb++;
      }
      if ($nb>1) break;
    }
    if (!$len) $len=1;
    $patterns[]=Array('number'=>$nb, 'values'=>array_slice($array, $pos, $len));
  }
  return $patterns;
}

这将与您的示例匹配:

  

{['a','b','c','d'],['a','b','c','d']},['c','d']

     

或{['x'],['x']},['y'],{['x','b'],['x','b']},['a' ]

困难的部分更多是关于以下的例子:

  

{['one','one','two'],['one','one','two']}

或者最困难的选择:

  

一,二,一,二,一,二,一,二

因为我们可以将其分为两种形式:

  

[一,二],[一,二],[一,二],[一,二]

     

[一,二,一,二],[一,二,一,二]

没有“明显”的选择。我的上述算法将始终考虑最长匹配,因为这是考虑任何组合的最简单的实现。

编辑:您还应该考虑最长匹配时间较短的情况

示例:

  '一','二','一','二','三','四','一','二','三','四'

如果从左到右开始,您可能希望分组为:

  

{['one','two'],['one','two'],}'three','four','one','two','three','four'

当你可以分组时:

  

'one','two',{['one','two','three','four'],['one','two','three','four']}

这种情况必须通过递归调用来解决,以获得更好的解决方案,但这会导致更长的执行时间:

function pepito($array) {
  if (($sz=count($array))<1) return Array();
  $pos=0;
  $nb=1;
  for ($len=floor($sz/2);$len>0;$len--) {
    while (array_slice($array, $pos, $len)==array_slice($array, $pos+$len, $len)) {
      $pos+=$len;
      $nb++;
    }
    if ($nb>1) break;
  }

  if (!$len) $len=1;
  $rec1=pepito(array_slice($array, $pos+$len));
  $rec2=pepito(array_slice($array, 1));

  if (count($rec1)<count($rec2)+1) {
    return array_merge(Array(Array('number'=>$nb, 'values'=>array_slice($array, $pos, $len))), $rec1);
  }
  return array_merge(Array(Array('number'=>1, 'values'=>array_slice($array, 0, 1))), $rec2);
}

答案 3 :(得分:4)

<强>解释

模式库:在模式中重复的元素序列。 (即。[a,b,a,b,c],[a,b]是模式基础,[a,b,a,b]是模式。

我们想要开始搜索最长的模式库,然后是下一个最长的模式库,依此类推。重要的是要理解,如果我们找到一个模式,我们不需要在其中检查具有相同长度的基础的另一个模式的开始。

这是证据。

假设A是模式库,并且我们遇到了模式AA。假设B是具有相同长度的另一个图案基础,其形成从A开始的图案。设Y是重叠元素。如果A = XY,则AA = XYXY。因为B是相同的长度,所以必须是B = YX的情况,因为为了完成B,我们必须使用A中的剩余元素。此外,由于B形成图案,我们必须具有BB,即YXYX。由于A在B之前开始,我们有XYXYX = AAX = XBB。如果B再次重复,我们将得到XBBB = XYXYXYX = AAAX。因此,B不能重复额外的时间而不重复额外的时间。因此,我们不需要在A生成的模式中检查更长的模式。

可能的最长模式由整个列表中的一半元素组成,因为最简单的模式可以恰好发生两次。因此,我们可以开始检查长度为一半的模式,然后逐步确定大小为2的模式。

假设我们从左到右搜索数组,如果找到一个模式,我们只需在其两侧搜索其他模式。在左边,没有具有相同长度的基部的图案,或者它们将事先被发现。因此,我们使用下一个最小的基本尺寸在左侧搜索图案。尚未搜索模式右侧的元素,因此我们继续使用相同大小的基础搜索模式。

执行此操作的功能如下:

function get_patterns($arr, $len = null) {
    // The smallest pattern base length for which a pattern can be found
    $minlen = 2;

    // No pattern base length was specified
    if ($len === null) {
        // Use the longest pattern base length possible
        $maxlen = floor(count($arr) / 2);
        return get_patterns($arr, $maxlen);

    // Base length is too small to find any patterns
    } else if ($len < $minlen) {
        // Compile elements into lists consisting of one element

        $results = array();

        $num = 1;
        $elem = $arr[0];

        for ($i=1; $i < count($arr); $i++) {
            if ($elem === $arr[$i]) {
                $num++;
            } else {
                array_push($results, array(
                    'number' => $num,
                    'values' => array( $elem )
                ));

                $num = 1;
                $elem = $arr[$i];
            }
        }

        array_push($results, array(
            'number' => $num,
            'values' => array( $elem )
        ));

        return $results;
    }

    // Cycle through elements until there aren't enough elements to fit
    //  another repition.
    for ($i=0; $i < count($arr) - $len * 2 + 1; $i++) {
        // Number of times pattern base occurred
        $num_read = 1; // One means there is no pattern yet

        // Current pattern base we are attempting to match against
        $base = array_slice($arr, $i, $len);

        // Check for matches using segments of the same length for the elements
        //  following the current pattern base
        for ($j = $i + $len; $j < count($arr) - $len + 1; $j += $len) {
            // Elements being compared to pattern base
            $potential_match = array_slice($arr, $j, $len);

            // Match found
            if (has_same_elements($base, $potential_match)) {
                $num_read++;

            // NO match found
            } else {
                // Do not check again using currently selected elements
                break;
            }
        }

        // Patterns were encountered
        if ($num_read > 1) {
            // The total number of elements that make up the pattern
            $pattern_len = $num_read * $len;

            // The elements before the pattern
            $before = array_slice($arr, 0, $i);

            // The elements after the pattern
            $after = array_slice(
                $arr, $i + $pattern_len, count($arr) - $pattern_len - $i
            );

            $results = array_merge(
                // Patterns of a SMALLER length may exist beforehand
                count($before) > 0 ? get_patterns($before, $len-1) : array(),

                // Patterns that were found
                array(
                    array(
                        'number' => $num_read,
                        'values' => $base
                    )
                ),

                // Patterns of the SAME length may exist afterward
                count($after) > 0 ? get_patterns($after, $len) : array()
            );

            return $results;
        }
    }

    // No matches were encountered

    // Search for SMALLER patterns
    return get_patterns($arr, $len-1);
}

函数has_same_elements用于检查具有原始键的数组是否相同,如下所示:

// Returns true if two arrays have the same elements.
//
// Precondition: Elements must be primitive data types (ie. int, string, etc)
function has_same_elements($a1, $a2) {
    // There are a different number of elements
    if (count($a1) != count($a2)) {
        return false;
    }

    for ($i=0; $i < count($a1); $i++) {
        if ($a1[$i] !== $a2[$i]) {
            return false;
        }
    }

    return true;
}

为了加快代码速度,您可以做一些事情。您可以为函数提供索引,而不是切片数组,以及要检查的开始和结束位置以及数组。此外,使用字符串可能会很慢,因此您可以创建一个将字符串映射到数字的数组,反之亦然。然后,您可以将字符串数组转换为数字数组,然后使用它。得到结果后,您可以将数字数组转换回字符串。

我使用以下代码测试了该函数:

$tests = array(
    'a,b,c,d',
    'a',
    'a,a,a,a',
    'a,a,a,a,a',
    'a,a,a,a,a,a',
    'b,a,a,a,a,c',
    'b,b,a,a,a,a,c,c',
    'b,b,a,a,d,a,a,c,c',
    'a,b,c,d,a,b,c,d,c,d',
    'x,x,y,x,b,x,b,a'
);

echo '<pre>';
foreach ($tests as $test) {
    echo '<div>';
    $arr = explode(',',$test);
    echo "$test<br /><br />";
    pretty_print(get_patterns($arr));
    echo '</div><br />';
}
echo '</pre>';

我用来打印输出的函数pretty_print如下:

function pretty_print($results) {
    foreach ($results as $result) {
        $a = "array('" . implode("','", $result['values']) . "')";
        echo "array('number' => ${result['number']}, 'values' => $a)<br />";
    }
}

测试代码的输出如下:

a,b,c,d

array('number' => 1, 'values' => array('a'))
array('number' => 1, 'values' => array('b'))
array('number' => 1, 'values' => array('c'))
array('number' => 1, 'values' => array('d'))

a

array('number' => 1, 'values' => array('a'))

a,a,a,a

array('number' => 2, 'values' => array('a','a'))

a,a,a,a,a

array('number' => 2, 'values' => array('a','a'))
array('number' => 1, 'values' => array('a'))

a,a,a,a,a,a

array('number' => 2, 'values' => array('a','a','a'))

b,a,a,a,a,c

array('number' => 1, 'values' => array('b'))
array('number' => 2, 'values' => array('a','a'))
array('number' => 1, 'values' => array('c'))

b,b,a,a,a,a,c,c

array('number' => 2, 'values' => array('b'))
array('number' => 2, 'values' => array('a','a'))
array('number' => 2, 'values' => array('c'))

b,b,a,a,d,a,a,c,c

array('number' => 2, 'values' => array('b'))
array('number' => 2, 'values' => array('a'))
array('number' => 1, 'values' => array('d'))
array('number' => 2, 'values' => array('a'))
array('number' => 2, 'values' => array('c'))

a,b,c,d,a,b,c,d,c,d

array('number' => 2, 'values' => array('a','b','c','d'))
array('number' => 1, 'values' => array('c'))
array('number' => 1, 'values' => array('d'))

x,x,y,x,b,x,b,a

array('number' => 2, 'values' => array('x'))
array('number' => 1, 'values' => array('y'))
array('number' => 2, 'values' => array('x','b'))
array('number' => 1, 'values' => array('a'))

答案 4 :(得分:3)

好的,这是我的看法,下面的代码将整个原始数组拆分成最长的相邻非重叠块。

所以在这样的情况下

one, two, one, two, one, two, one, two
[one two one two], [one two one two]

'one' 'two' 'one' 'two' 'three' 'four' 'one' 'two' 'three' 'four'    
['one'] ['two'] ['one' 'two' 'three' 'four'] ['one' 'two' 'three' 'four']

它更喜欢2个长组到4个短组。

更新:还使用其他答案中的示例进行了测试,也适用于这些案例:

<?php

/*
 * Splits an $array into chunks of $chunk_size.
 * Returns number of repeats, start index and chunk which has
 * max number of ajacent repeats.
 */
function getRepeatCount($array, $chunk_size) {
    $parts = array_chunk($array, $chunk_size);
    $maxRepeats = 1;
    $maxIdx = 0;
    $repeats = 1;
    $len = count($parts);
    for ($i = 0; $i < $len-1; $i++) {
        if ($parts[$i] === $parts[$i+1]) {
            $repeats += 1;
            if ($repeats > $maxRepeats) {
                $maxRepeats = $repeats;
                $maxIdx = $i - ($repeats-2);
            }
        } else {
            $repeats = 1;
        }
    }
    return array($maxRepeats, $maxIdx*$chunk_size, $parts[$maxIdx]);
}

/*
 * Finds longest pattern in the $array.
 * Returns number of repeats, start index and pattern itself.
 */
function findLongestPattern($array) {
    $len = count($array);
    for ($window = floor($len/2); $window >= 1; $window--) {
      $num_chunks = ceil($len/$window);
      for ($i = 0; $i < $num_chunks; $i++) {
        list($repeats, $idx, $pattern) = getRepeatCount(
          array_slice($array, $i), $window
        );
        if ($repeats > 1) {
            return array($repeats, $idx+$i, $pattern);
        }
      }
    }
    return array(1, 0, [$array[0]]);
}

/*
 * Splits $array into longest adjacent non-overlapping parts.
 */
function splitToPatterns($array) {
    if (count($array) < 1) {
        return $array;
    }
    list($repeats, $start, $pattern) = findLongestPattern($array);
    $end = $start + count($pattern) * $repeats;
    return array_merge(
            splitToPatterns(array_slice($array, 0, $start)),
            array(
                array('number'=>$repeats, 'values' => $pattern)
            ),
            splitToPatterns(array_slice($array, $end))
    );
}

以下是代码和测试:

function isEquals($expected, $actual) {
    $exp_str = json_encode($expected);
    $act_str = json_encode($actual);
    $equals = $exp_str === $act_str;
    if (!$equals) {
        echo 'Equals check failed'.PHP_EOL;
        echo 'expected: '.$exp_str.PHP_EOL;
        echo 'actual  : '.$act_str.PHP_EOL;
    }
    return $equals;
}

assert(isEquals(
    array(1, 0, ['a']), getRepeatCount(['a','b','c'], 1)
));
assert(isEquals(
    array(1, 0, ['a']), getRepeatCount(['a','b','a','b','c'], 1)
));
assert(isEquals(
    array(2, 0, ['a','b']), getRepeatCount(['a','b','a','b','c'], 2)
));
assert(isEquals(
    array(1, 0, ['a','b','a']), getRepeatCount(['a','b','a','b','c'], 3)
));
assert(isEquals(
    array(3, 0, ['a','b']), getRepeatCount(['a','b','a','b','a','b','a'], 2)
));
assert(isEquals(
    array(2, 2, ['a','c']), getRepeatCount(['x','c','a','c','a','c'], 2)
));
assert(isEquals(
    array(1, 0, ['x','c','a']), getRepeatCount(['x','c','a','c','a','c'], 3)
));
assert(isEquals(
    array(2, 0, ['a','b','c','d']),
    getRepeatCount(['a','b','c','d','a','b','c','d','c','d'],4)
));

assert(isEquals(
    array(2, 2, ['a','c']), findLongestPattern(['x','c','a','c','a','c'])
));
assert(isEquals(
    array(1, 0, ['a']), findLongestPattern(['a','b','c'])
));
assert(isEquals(
    array(2, 2, ['c','a']),
    findLongestPattern(['a','b','c','a','c','a'])
));
assert(isEquals(
    array(2, 0, ['a','b','c','d']),
    findLongestPattern(['a','b','c','d','a','b','c','d','c','d'])
));


// Find longest adjacent non-overlapping patterns
assert(isEquals(
    array(
        array('number'=>1, 'values'=>array('a')),
        array('number'=>1, 'values'=>array('b')),
        array('number'=>1, 'values'=>array('c')),
    ),
    splitToPatterns(['a','b','c'])
));
assert(isEquals(
    array(
        array('number'=>1, 'values'=>array('a')),
        array('number'=>1, 'values'=>array('b')),
        array('number'=>2, 'values'=>array('c','a')),
    ),
    splitToPatterns(['a','b','c','a','c','a'])
));
assert(isEquals(
    array(
        array('number'=>2, 'values'=>array('a','b','c','d')),
        array('number'=>1, 'values'=>array('c')),
        array('number'=>1, 'values'=>array('d')),
    ),
    splitToPatterns(['a','b','c','d','a','b','c','d','c','d'])
));
/*     'a', 'b', 'a', 'b', 'a', 'b', 'a', 'b', 'c', 'd', */
/*     [                 ] [                 ] [ ]  [  ] */
/* NOT [      ] [        ] [      ]  [       ] [ ]  [  ] */
assert(isEquals(
    array(
        array('number'=>2, 'values'=>array('a','b','a','b')),
        array('number'=>1, 'values'=>array('c')),
        array('number'=>1, 'values'=>array('d')),
    ),
    splitToPatterns(['a','b','a','b','a','b','a','b','c','d'])
));

/*     'x', 'x', 'y', 'x', 'b', 'x', 'b', 'a' */
/* //  [  ] [  ] [ ]  [       ] [      ]  [ ] */
assert(isEquals(
    array(
        array('number'=>2, 'values'=>array('x')),
        array('number'=>1, 'values'=>array('y')),
        array('number'=>2, 'values'=>array('x','b')),
        array('number'=>1, 'values'=>array('a')),
    ),
    splitToPatterns(['x','x','y','x','b','x','b','a'])
));
// one, two, one, two, one, two, one, two
// [                ] [                 ]
assert(isEquals(
    array(
        array('number'=>2, 'values'=>array('one', 'two', 'one', 'two')),
    ),
    splitToPatterns(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'])
));
// 'one', 'two', 'one', 'two', 'three', 'four', 'one', 'two', 'three', 'four'
// [   ]  [   ]  [                           ]  [                           ]
assert(isEquals(
    array(
        array('number'=>1, 'values'=>array('one')),
        array('number'=>1, 'values'=>array('two')),
        array('number'=>2, 'values'=>array('one','two','three','four')),
    ),
    splitToPatterns(['one', 'two', 'one', 'two', 'three', 'four', 'one', 'two', 'three','four'])
));

/*     'a', 'a', 'b', 'a', 'b', 'a', 'b', 'a', 'b', 'c', */
/*     [  ] [                 ] [                 ] [ ]  */
assert(isEquals(
    array(
        array('number'=>1, 'values'=>array('a')),
        array('number'=>2, 'values'=>array('a','b','a','b')),
        array('number'=>1, 'values'=>array('c')),
    ),
    splitToPatterns(['a','a','b','a','b','a','b','a','b','c'])
));

/* 'a', 'b', 'a', 'b', 'c', 'd', 'a', 'b', 'a', 'b', 'a', 'b' */
// [      ]  [      ]  [ ]  [ ]  [      ] [       ]  [      ]
assert(isEquals(
    array(
        array('number'=>2, 'values'=>array('a', 'b')),
        array('number'=>1, 'values'=>array('c')),
        array('number'=>1, 'values'=>array('d')),
        array('number'=>3, 'values'=>array('a','b')),
    ),
    splitToPatterns(['a', 'b', 'a', 'b', 'c', 'd', 'a', 'b', 'a', 'b', 'a', 'b'])
));
/* 'a', 'c', 'd', 'a', 'b', 'a', 'b', 'a', 'b', 'a', 'b', 'c', */
/* [  ] [  ] [  ] [                 ] [                 ] [ ]  */
assert(isEquals(
    array(
        array('number'=>1, 'values'=>array('a')),
        array('number'=>2, 'values'=>array('a','b','a','b')),
        array('number'=>1, 'values'=>array('c')),
    ),
    splitToPatterns(['a','a','b','a','b','a','b','a','b','c'])
));

试验:

{{1}}

答案 5 :(得分:2)

我现在从这开始,但最后我的大脑燃烧,我不知道从哪里开始比较阵列...享受!

$array = array(
    'x', 'x', 'y', 'x', 'b', 'x', 'b', 'a'
    //[.......] [.] [[......]  [......]] [.]
);

$arrayCount = count($array);

$res = array();
for($i = 0; $i < $arrayCount; $i++) {
    for($j = 1; $j < $arrayCount; $j++) {
        $res[$i][] = array_slice($array, $i, $j);
    }
}

//echo '<pre>';
//var_dump($res);
//echo '</pre>';
//
//die;


$resCount = count($res);
$oneResCount = count($res[0]);

答案 6 :(得分:2)

首先创建一个函数,它将在数组中为给定的组数组找到可能的组匹配项,从数组中的特定索引开始,并返回找到的匹配项数。

function findGroupMatch($group, $array, $startFrom) {
    $match = 0;
    while($group == array_slice($array, $startFrom, count($group))) {
        $match++;
        $startFrom += count($group);
    }
    return $match;
}

现在,我们需要遍历每个项目以查找可能的组,然后将其发送到findGroupMatch()函数,以检查下一个项目中是否存在该组的匹配项。找到可能组的技巧是找到与之前任何项匹配的项。如果是这样,我们会找到一个可能的组,从匹配的项目开始采用所有以前的项目。否则,我们只增加不匹配项目列表,最后我们将所有不匹配的项目作为单个项目组输入。 (在给定的示例中,我们有a, b, c, d, a....当我们在数组中找到第二个a时,它与之前的a匹配,因此,我们认为a, b, c, d是一个可能的组并发送它将运行findGroupMatch(),以检查我们可以在下一个项目中找到多少组。)

$array = array(
    'a', 'b', 'c', 'd',
    'a', 'b', 'c', 'd',
    'c', 'd',
);

$unMatchedItems = array();
$totalCount = count($array);
$groupsArray = array();

for($i=0; $i < $totalCount; $i++) {
    $item = $array[$i];

    if(in_array($item, $unMatchedItems)) {
        $matched_keys = array_keys($unMatchedItems, $item);
        foreach($matched_keys as $key) {
            $possibleGroup = array_slice($unMatchedItems, $key);

            $matches = findGroupMatch($possibleGroup, $array, $i);

            if ($matches) {
                //Insert the items before group as single item group
                if ($key > 0) {
                    for ($j = 0; $j < $key; $j++) {
                        $groupsArray[] = array('number' => 1, 'values' => array($unMatchedItems[$j]));
                    }
                }
                //Insert the group array
                $groupsArray[] = array('number' => $matches + 1, 'values' => $possibleGroup); //number includes initial group also so $matches + 1
                $i += (count($possibleGroup) * $matches) - 1; //skip the matched items from next iteration
                //Empty the unmatched array to start with a new group search
                $unMatchedItems = array();
                break;
            }
        }
        //If there was no matches, add the item to the unMatched group
        if(!$matches) $unMatchedItems[] = $item;
    } else {
        $unMatchedItems[] = $item;
    }

}

//Insert the remaining items as single item group
for($k=0; $k<count($unMatchedItems); $k++) {
    $groupsArray[] = array('number' => 1, 'values' => array($unMatchedItems[$k]));
}

print_r($groupsArray);

结果如下:(检查此PHP Fiddle进行测试,并https://eval.in/507333进行另一次输入测试。)

Array
(
    [0] => Array
    (
        [number] => 2
        [values] => Array
        (
            [0] => a
            [1] => b
            [2] => c
            [3] => d
        )

    )

    [1] => Array
    (
        [number] => 1
        [values] => Array
        (
            [0] => c
        )

    )

    [2] => Array
    (
        [number] => 1
        [values] => Array
        (
            [0] => d
        )

    )

)

答案 7 :(得分:2)

使用递归,第一个示例非常简单。 第二个示例......不那么容易。

以下示例仅适用于第一个示例,假设任何模式都不应包含两个相同的元素。这也将处理原始数组末尾的所有单个元素模式,并保持模式顺序(第一个模式出现)。

function find_pattern($input, &$result) {
    $values = []; // currently processed elements
    $pattern = ''; // the current element pattern
    $dupe_found = false; // did we find a duplicate element?

    // search the values for the first that matches a previous value
    while ($next = array_shift($input)) {
        // check if the element was already found
        if (in_array($next, $values)) {
            // re-add the value back into the input, since the next call needs it
            array_unshift($input, $next);

            // add the resulting pattern
            add_pattern($pattern, $values, $result);

            // find the next pattern with a recursive call
            find_pattern($input, $result);

            // a duplicate element was found!
            $dupe_found = true;

            // the rest of the values are handled by recursion, break the while loop
            break;
        } else {
            // not already found, so store the element and keep going
            $values[] = $next;

            // use the element to produce a key for the result set
            $pattern .= $next;
        }
    }

    // if no duplicate was found, then each value should be an individual pattern
    if (!$dupe_found) {
        foreach ($values as $value) {
            add_pattern($value, [$value], $result);
        }
    }
}

function add_pattern($pattern, $values, &$result) {
    // increment the pattern count
    $result[$pattern]['number'] = isset($result[$pattern]['number']) ?
        result[$pattern]['number']+1 : 1;

    // add the current pattern to the result, if not already done
    if (!isset($result[$pattern]['values'])) {
        $result[$pattern]['values'] = $values;
    }
}

一个示例用法:

$input = [
    'a', 'b', 'c', 'd',
    'a', 'b', 'c', 'd',
    'c', 'd'
];

$result = [];
find_pattern($input, $result);

echo "<pre>";
print_r($result);
echo "</pre>";

示例输出:

Array
(
    [abcd] => Array
    (
        [number] => 2
        [values] => Array
        (
            [0] => a
            [1] => b
            [2] => c
            [3] => d
        )
    )

    [c] => Array
    (
        [number] => 1
        [values] => Array
        (
            [0] => c
        )
    )

    [d] => Array
    (
        [number] => 1
        [values] => Array
        (
            [0] => d
        )
    )
)

答案 8 :(得分:2)

你可以这样做:

<?php
$array = array(
    'a', 'b', 'c', 'd',
    'a', 'b', 'c', 'd',
    'c', 'd'
);

// Call this function to get your patterns
function patternMatching(array $array) {
    $patterns = array();
    $belongsToPattern = array_fill(0, count($array), false);

    // Find biggest patterns first
    for ($size = (int) (count($array) / 2); $size > 0; $size--) {

        // for each pattern: start at every possible point in the array
        for($start=0; $start <= count($array) - $size; $start++) {

            $p = findPattern($array, $start, $size);

            if($p != null) {

                /* Before we can save the pattern we need to check, if we've found a
                 * pattern that does not collide with patterns we've already found */
                $hasConflict = false;
                foreach($p["positions"] as $idx => $pos) {
                    $PatternConflicts = array_slice($belongsToPattern, $pos, $p["size"]);
                    $hasConflict = $hasConflict || in_array(true, $PatternConflicts);
                }

                if(!$hasConflict) {

                    /* Since we have found a pattern, we don't want to find more 
                     * patterns for these positions */
                    foreach($p["positions"] as $idx => $pos) {
                        $replace = array_fill($pos, $p["size"], true);
                        $belongsToPattern = array_replace($belongsToPattern, $replace);
                    }

                    $patterns[] = $p;
                    // or only return number and values:
                    // $patterns[] = [ "number" => $p["number"], "values" => $p["values"]];
                }
            }
        }
    }

    return $patterns;
}


function findPattern(array $haystack, $patternStart, $patternSize ) {

    $size = count($haystack);
    $patternCandidate = array_slice($haystack, $patternStart, $patternSize);

    $patternCount = 1;
    $patternPositions = [$patternStart];

    for($i = $patternStart + $patternSize; $i <= $size - $patternSize; $i++) {

        $patternCheck = array_slice($haystack, $i, $patternSize);

        $diff = array_diff($patternCandidate, $patternCheck);

        if(empty($diff)) {
            $patternCount++;
            $patternPositions[] = $i;
        }

    }

    if($patternCount > 1 || $patternSize <= 1) {

        return [
            "number"    => $patternCount,
            "values"    => $patternCandidate,

            // Additional information needed for filtering, sorting, etc.
            "positions" => $patternPositions,
            "size"      => $patternSize
        ];
    } else {
        return null;
    }

}

$patterns = patternMatching($array);

print "<pre>";
print_r($patterns);
print "</pre>";

?>

代码可能远非速度最佳,但它应该为数组中的任何字符串序列执行您想要执行的操作。 patternMatching()返回按模式大小递减的模式,并按第一次出现的方式递增(您可以使用['positions'][0]作为排序条件来实现不同的顺序)。

答案 9 :(得分:1)

这应该这样做:

<?php

$array = array(
  'x', 'y', 'x', 'y', 'a',
  'ab', 'c', 'd',
  'a', 'b', 'c', 'd',
  'c', 'd', 'x', 'y', 'b',
  'x', 'y', 'b', 'c', 'd'
);


// convert the array to a string
$string = '';
foreach ($array as $a) {
  $l = strlen($a)-1;
  $string .= ($l) ? str_replace('::',':',$a[0] . ':' . substr($a,1,$l-1) . ':' . $a[$l]) . '-' : $a . '-';
}

// find patterns
preg_match_all('/(?=((.+)(?:.*?\2)+))/s', $string, $matches, PREG_SET_ORDER);
foreach ($matches as $m) {
  $temp = str_replace('--','-',$m[2].'-');
  $patterns[] = ($temp[0]==='-') ? substr($temp,1) : $temp;
}

// remove empty values and duplicates
$patterns = array_keys(array_flip(array_filter($patterns)));

// sort patterns
foreach ($patterns as $p) {
  $sorted[$p] = strlen($p);
}
arsort($sorted);

// find double or more occurences
$stringClone = $string;
foreach ($sorted as $s=>$n) {
  $nA = substr_count($stringClone,':'.$s);
  $nZ = substr_count($stringClone,$s.':');
  $number = substr_count($stringClone,$s);
  $sub = explode('-',substr($stringClone,strpos($stringClone,$s),$n-1));
  $values = $sub;
  if($nA>0 || $nZ>0){
    $numberAdjusted = $number - $nA - $nZ;
    if($numberAdjusted > 1) {
      $temp = '';
      while($n--){
        $temp .= '#';
      }
      $position = strpos(str_replace(':'.$s,':'.$temp,str_replace($s.':',$temp.':',$string)),$s);
      $stringClone = str_replace(':'.$s,':'.$temp,$stringClone);
      $stringClone = str_replace($s.':',$temp.':',$stringClone);
      $result['p'.sprintf('%09d', $position)] = array('number'=>$numberAdjusted,'values'=>$values);
      $stringClone = str_replace($s,'',$stringClone);
      $stringClone = str_replace($temp,$s,$stringClone);
    }
  } else if($number>1){
    $position = strpos($string,$s);
    $result['p'.sprintf('%09d', $position)] = array('number'=>$number,'values'=>$values);
    $stringClone = str_replace($s,'',$stringClone);
  }
}

// add the remaining items
$remaining = array_flip(explode('-',substr($stringClone,0,-1)));
foreach ($remaining as $r=>$n) {
    $position = strpos($string,$r);
    $result['p'.sprintf('%09d', $position)] = array('number'=>1,'values'=>str_replace(':','',$r));
}

// sort results
ksort($result);
$result = array_values($result);

print_r($result);
?>

工作示例here