我有一个字符串值数组,有时会形成重复值模式('a','b','c','d')
$array = array(
'a', 'b', 'c', 'd',
'a', 'b', 'c', 'd',
'c', 'd',
);
我想根据数组顺序找到重复的模式,并按相同的顺序对它们进行分组(以维护它)。
$patterns = array(
array('number' => 2, 'values' => array('a', 'b', 'c', 'd')),
array('number' => 1, 'values' => array('c'))
array('number' => 1, 'values' => array('d'))
);
请注意[a,b],[b,c],& [c,d]本身不是模式,因为它们位于较大的[a,b,c,d]模式中,最后的[c,d]模式只出现一次所以它也不是模式 - 只是个别值' c'和'd'
另一个例子:
$array = array(
'x', 'x', 'y', 'x', 'b', 'x', 'b', 'a'
//[.......] [.] [[......] [......]] [.]
);
产生
$patterns = array(
array('number' => 2, 'values' => array('x')),
array('number' => 1, 'values' => array('y')),
array('number' => 2, 'values' => array('x', 'b')),
array('number' => 1, 'values' => array('a'))
);
我该怎么做?
答案 0 :(得分:7)
字符数组只是字符串。正则表达式是字符串模式匹配的王者。添加递归,解决方案非常优雅,即使在字符数组中来回转换:
function findPattern($str){
$results = array();
if(is_array($str)){
$str = implode($str);
}
if(strlen($str) == 0){ //reached the end
return $results;
}
if(preg_match_all('/^(.+)\1+(.*?)$/',$str,$matches)){ //pattern found
$results[] = array('number' => (strlen($str) - strlen($matches[2][0])) / strlen($matches[1][0]), 'values' => str_split($matches[1][0]));
return array_merge($results,findPattern($matches[2][0]));
}
//no pattern found
$results[] = array('number' => 1, 'values' => array(substr($str, 0, 1)));
return array_merge($results,findPattern(substr($str, 1)));
}
答案 1 :(得分:5)
如果c和d可以分组,这是我的代码:
<?php
$array = array(
'a', 'b', 'c', 'd',
'a', 'b', 'c', 'd',
'c', 'd',
);
$res = array();
foreach ($array AS $value) {
if (!isset($res[$value])) {
$res[$value] = 0;
}
$res[$value]++;
}
foreach ($res AS $key => $value) {
$fArray[$value][] = $key;
for ($i = $value - 1; $i > 0; $i--) {
$fArray[$i][] = $key;
}
}
$res = array();
foreach($fArray AS $key => $value) {
if (!isset($res[serialize($value)])) {
$res[serialize($value)] = 0;
}
$res[serialize($value)]++;
}
$fArray = array();
foreach($res AS $key => $value) {
$fArray[] = array('number' => $value, 'values' => unserialize($key));
}
echo '<pre>';
var_dump($fArray);
echo '</pre>';
最终结果是:
array (size=2)
0 =>
array (size=2)
'number' => int 2
'values' =>
array (size=4)
0 => string 'a' (length=1)
1 => string 'b' (length=1)
2 => string 'c' (length=1)
3 => string 'd' (length=1)
1 =>
array (size=2)
'number' => int 1
'values' =>
array (size=2)
0 => string 'c' (length=1)
1 => string 'd' (length=1)
答案 2 :(得分:5)
以下代码将返回预期结果,找到具有重复值的最长部分:
function pepito($array) {
$sz=count($array);
$patterns=Array();
for ($pos=0;$pos<$sz;$pos+=$len) {
$nb=1;
for ($len=floor($sz/2);$len>0;$len--) {
while (array_slice($array, $pos, $len)==array_slice($array, $pos+$len, $len)) {
$pos+=$len;
$nb++;
}
if ($nb>1) break;
}
if (!$len) $len=1;
$patterns[]=Array('number'=>$nb, 'values'=>array_slice($array, $pos, $len));
}
return $patterns;
}
这将与您的示例匹配:
{['a','b','c','d'],['a','b','c','d']},['c','d']
或{['x'],['x']},['y'],{['x','b'],['x','b']},['a' ]
困难的部分更多是关于以下的例子:
{['one','one','two'],['one','one','two']}
或者最困难的选择:
一,二,一,二,一,二,一,二
因为我们可以将其分为两种形式:
[一,二],[一,二],[一,二],[一,二]
[一,二,一,二],[一,二,一,二]
没有“明显”的选择。我的上述算法将始终考虑最长匹配,因为这是考虑任何组合的最简单的实现。
编辑:您还应该考虑最长匹配时间较短的情况:
示例:
'一','二','一','二','三','四','一','二','三','四'
如果从左到右开始,您可能希望分组为:
{['one','two'],['one','two'],}'three','four','one','two','three','four'
当你可以分组时:
'one','two',{['one','two','three','four'],['one','two','three','four']}
这种情况必须通过递归调用来解决,以获得更好的解决方案,但这会导致更长的执行时间:
function pepito($array) {
if (($sz=count($array))<1) return Array();
$pos=0;
$nb=1;
for ($len=floor($sz/2);$len>0;$len--) {
while (array_slice($array, $pos, $len)==array_slice($array, $pos+$len, $len)) {
$pos+=$len;
$nb++;
}
if ($nb>1) break;
}
if (!$len) $len=1;
$rec1=pepito(array_slice($array, $pos+$len));
$rec2=pepito(array_slice($array, 1));
if (count($rec1)<count($rec2)+1) {
return array_merge(Array(Array('number'=>$nb, 'values'=>array_slice($array, $pos, $len))), $rec1);
}
return array_merge(Array(Array('number'=>1, 'values'=>array_slice($array, 0, 1))), $rec2);
}
答案 3 :(得分:4)
<强>解释强>:
模式库:在模式中重复的元素序列。 (即。[a,b,a,b,c],[a,b]是模式基础,[a,b,a,b]是模式。
我们想要开始搜索最长的模式库,然后是下一个最长的模式库,依此类推。重要的是要理解,如果我们找到一个模式,我们不需要在其中检查具有相同长度的基础的另一个模式的开始。
这是证据。
假设A是模式库,并且我们遇到了模式AA。假设B是具有相同长度的另一个图案基础,其形成从A开始的图案。设Y是重叠元素。如果A = XY,则AA = XYXY。因为B是相同的长度,所以必须是B = YX的情况,因为为了完成B,我们必须使用A中的剩余元素。此外,由于B形成图案,我们必须具有BB,即YXYX。由于A在B之前开始,我们有XYXYX = AAX = XBB。如果B再次重复,我们将得到XBBB = XYXYXYX = AAAX。因此,B不能重复额外的时间而不重复额外的时间。因此,我们不需要在A生成的模式中检查更长的模式。
可能的最长模式由整个列表中的一半元素组成,因为最简单的模式可以恰好发生两次。因此,我们可以开始检查长度为一半的模式,然后逐步确定大小为2的模式。
假设我们从左到右搜索数组,如果找到一个模式,我们只需在其两侧搜索其他模式。在左边,没有具有相同长度的基部的图案,或者它们将事先被发现。因此,我们使用下一个最小的基本尺寸在左侧搜索图案。尚未搜索模式右侧的元素,因此我们继续使用相同大小的基础搜索模式。
执行此操作的功能如下:
function get_patterns($arr, $len = null) {
// The smallest pattern base length for which a pattern can be found
$minlen = 2;
// No pattern base length was specified
if ($len === null) {
// Use the longest pattern base length possible
$maxlen = floor(count($arr) / 2);
return get_patterns($arr, $maxlen);
// Base length is too small to find any patterns
} else if ($len < $minlen) {
// Compile elements into lists consisting of one element
$results = array();
$num = 1;
$elem = $arr[0];
for ($i=1; $i < count($arr); $i++) {
if ($elem === $arr[$i]) {
$num++;
} else {
array_push($results, array(
'number' => $num,
'values' => array( $elem )
));
$num = 1;
$elem = $arr[$i];
}
}
array_push($results, array(
'number' => $num,
'values' => array( $elem )
));
return $results;
}
// Cycle through elements until there aren't enough elements to fit
// another repition.
for ($i=0; $i < count($arr) - $len * 2 + 1; $i++) {
// Number of times pattern base occurred
$num_read = 1; // One means there is no pattern yet
// Current pattern base we are attempting to match against
$base = array_slice($arr, $i, $len);
// Check for matches using segments of the same length for the elements
// following the current pattern base
for ($j = $i + $len; $j < count($arr) - $len + 1; $j += $len) {
// Elements being compared to pattern base
$potential_match = array_slice($arr, $j, $len);
// Match found
if (has_same_elements($base, $potential_match)) {
$num_read++;
// NO match found
} else {
// Do not check again using currently selected elements
break;
}
}
// Patterns were encountered
if ($num_read > 1) {
// The total number of elements that make up the pattern
$pattern_len = $num_read * $len;
// The elements before the pattern
$before = array_slice($arr, 0, $i);
// The elements after the pattern
$after = array_slice(
$arr, $i + $pattern_len, count($arr) - $pattern_len - $i
);
$results = array_merge(
// Patterns of a SMALLER length may exist beforehand
count($before) > 0 ? get_patterns($before, $len-1) : array(),
// Patterns that were found
array(
array(
'number' => $num_read,
'values' => $base
)
),
// Patterns of the SAME length may exist afterward
count($after) > 0 ? get_patterns($after, $len) : array()
);
return $results;
}
}
// No matches were encountered
// Search for SMALLER patterns
return get_patterns($arr, $len-1);
}
函数has_same_elements
用于检查具有原始键的数组是否相同,如下所示:
// Returns true if two arrays have the same elements.
//
// Precondition: Elements must be primitive data types (ie. int, string, etc)
function has_same_elements($a1, $a2) {
// There are a different number of elements
if (count($a1) != count($a2)) {
return false;
}
for ($i=0; $i < count($a1); $i++) {
if ($a1[$i] !== $a2[$i]) {
return false;
}
}
return true;
}
为了加快代码速度,您可以做一些事情。您可以为函数提供索引,而不是切片数组,以及要检查的开始和结束位置以及数组。此外,使用字符串可能会很慢,因此您可以创建一个将字符串映射到数字的数组,反之亦然。然后,您可以将字符串数组转换为数字数组,然后使用它。得到结果后,您可以将数字数组转换回字符串。
我使用以下代码测试了该函数:
$tests = array(
'a,b,c,d',
'a',
'a,a,a,a',
'a,a,a,a,a',
'a,a,a,a,a,a',
'b,a,a,a,a,c',
'b,b,a,a,a,a,c,c',
'b,b,a,a,d,a,a,c,c',
'a,b,c,d,a,b,c,d,c,d',
'x,x,y,x,b,x,b,a'
);
echo '<pre>';
foreach ($tests as $test) {
echo '<div>';
$arr = explode(',',$test);
echo "$test<br /><br />";
pretty_print(get_patterns($arr));
echo '</div><br />';
}
echo '</pre>';
我用来打印输出的函数pretty_print
如下:
function pretty_print($results) {
foreach ($results as $result) {
$a = "array('" . implode("','", $result['values']) . "')";
echo "array('number' => ${result['number']}, 'values' => $a)<br />";
}
}
测试代码的输出如下:
a,b,c,d
array('number' => 1, 'values' => array('a'))
array('number' => 1, 'values' => array('b'))
array('number' => 1, 'values' => array('c'))
array('number' => 1, 'values' => array('d'))
a
array('number' => 1, 'values' => array('a'))
a,a,a,a
array('number' => 2, 'values' => array('a','a'))
a,a,a,a,a
array('number' => 2, 'values' => array('a','a'))
array('number' => 1, 'values' => array('a'))
a,a,a,a,a,a
array('number' => 2, 'values' => array('a','a','a'))
b,a,a,a,a,c
array('number' => 1, 'values' => array('b'))
array('number' => 2, 'values' => array('a','a'))
array('number' => 1, 'values' => array('c'))
b,b,a,a,a,a,c,c
array('number' => 2, 'values' => array('b'))
array('number' => 2, 'values' => array('a','a'))
array('number' => 2, 'values' => array('c'))
b,b,a,a,d,a,a,c,c
array('number' => 2, 'values' => array('b'))
array('number' => 2, 'values' => array('a'))
array('number' => 1, 'values' => array('d'))
array('number' => 2, 'values' => array('a'))
array('number' => 2, 'values' => array('c'))
a,b,c,d,a,b,c,d,c,d
array('number' => 2, 'values' => array('a','b','c','d'))
array('number' => 1, 'values' => array('c'))
array('number' => 1, 'values' => array('d'))
x,x,y,x,b,x,b,a
array('number' => 2, 'values' => array('x'))
array('number' => 1, 'values' => array('y'))
array('number' => 2, 'values' => array('x','b'))
array('number' => 1, 'values' => array('a'))
答案 4 :(得分:3)
好的,这是我的看法,下面的代码将整个原始数组拆分成最长的相邻非重叠块。
所以在这样的情况下
one, two, one, two, one, two, one, two
[one two one two], [one two one two]
'one' 'two' 'one' 'two' 'three' 'four' 'one' 'two' 'three' 'four'
['one'] ['two'] ['one' 'two' 'three' 'four'] ['one' 'two' 'three' 'four']
它更喜欢2个长组到4个短组。
更新:还使用其他答案中的示例进行了测试,也适用于这些案例:
<?php
/*
* Splits an $array into chunks of $chunk_size.
* Returns number of repeats, start index and chunk which has
* max number of ajacent repeats.
*/
function getRepeatCount($array, $chunk_size) {
$parts = array_chunk($array, $chunk_size);
$maxRepeats = 1;
$maxIdx = 0;
$repeats = 1;
$len = count($parts);
for ($i = 0; $i < $len-1; $i++) {
if ($parts[$i] === $parts[$i+1]) {
$repeats += 1;
if ($repeats > $maxRepeats) {
$maxRepeats = $repeats;
$maxIdx = $i - ($repeats-2);
}
} else {
$repeats = 1;
}
}
return array($maxRepeats, $maxIdx*$chunk_size, $parts[$maxIdx]);
}
/*
* Finds longest pattern in the $array.
* Returns number of repeats, start index and pattern itself.
*/
function findLongestPattern($array) {
$len = count($array);
for ($window = floor($len/2); $window >= 1; $window--) {
$num_chunks = ceil($len/$window);
for ($i = 0; $i < $num_chunks; $i++) {
list($repeats, $idx, $pattern) = getRepeatCount(
array_slice($array, $i), $window
);
if ($repeats > 1) {
return array($repeats, $idx+$i, $pattern);
}
}
}
return array(1, 0, [$array[0]]);
}
/*
* Splits $array into longest adjacent non-overlapping parts.
*/
function splitToPatterns($array) {
if (count($array) < 1) {
return $array;
}
list($repeats, $start, $pattern) = findLongestPattern($array);
$end = $start + count($pattern) * $repeats;
return array_merge(
splitToPatterns(array_slice($array, 0, $start)),
array(
array('number'=>$repeats, 'values' => $pattern)
),
splitToPatterns(array_slice($array, $end))
);
}
以下是代码和测试:
function isEquals($expected, $actual) {
$exp_str = json_encode($expected);
$act_str = json_encode($actual);
$equals = $exp_str === $act_str;
if (!$equals) {
echo 'Equals check failed'.PHP_EOL;
echo 'expected: '.$exp_str.PHP_EOL;
echo 'actual : '.$act_str.PHP_EOL;
}
return $equals;
}
assert(isEquals(
array(1, 0, ['a']), getRepeatCount(['a','b','c'], 1)
));
assert(isEquals(
array(1, 0, ['a']), getRepeatCount(['a','b','a','b','c'], 1)
));
assert(isEquals(
array(2, 0, ['a','b']), getRepeatCount(['a','b','a','b','c'], 2)
));
assert(isEquals(
array(1, 0, ['a','b','a']), getRepeatCount(['a','b','a','b','c'], 3)
));
assert(isEquals(
array(3, 0, ['a','b']), getRepeatCount(['a','b','a','b','a','b','a'], 2)
));
assert(isEquals(
array(2, 2, ['a','c']), getRepeatCount(['x','c','a','c','a','c'], 2)
));
assert(isEquals(
array(1, 0, ['x','c','a']), getRepeatCount(['x','c','a','c','a','c'], 3)
));
assert(isEquals(
array(2, 0, ['a','b','c','d']),
getRepeatCount(['a','b','c','d','a','b','c','d','c','d'],4)
));
assert(isEquals(
array(2, 2, ['a','c']), findLongestPattern(['x','c','a','c','a','c'])
));
assert(isEquals(
array(1, 0, ['a']), findLongestPattern(['a','b','c'])
));
assert(isEquals(
array(2, 2, ['c','a']),
findLongestPattern(['a','b','c','a','c','a'])
));
assert(isEquals(
array(2, 0, ['a','b','c','d']),
findLongestPattern(['a','b','c','d','a','b','c','d','c','d'])
));
// Find longest adjacent non-overlapping patterns
assert(isEquals(
array(
array('number'=>1, 'values'=>array('a')),
array('number'=>1, 'values'=>array('b')),
array('number'=>1, 'values'=>array('c')),
),
splitToPatterns(['a','b','c'])
));
assert(isEquals(
array(
array('number'=>1, 'values'=>array('a')),
array('number'=>1, 'values'=>array('b')),
array('number'=>2, 'values'=>array('c','a')),
),
splitToPatterns(['a','b','c','a','c','a'])
));
assert(isEquals(
array(
array('number'=>2, 'values'=>array('a','b','c','d')),
array('number'=>1, 'values'=>array('c')),
array('number'=>1, 'values'=>array('d')),
),
splitToPatterns(['a','b','c','d','a','b','c','d','c','d'])
));
/* 'a', 'b', 'a', 'b', 'a', 'b', 'a', 'b', 'c', 'd', */
/* [ ] [ ] [ ] [ ] */
/* NOT [ ] [ ] [ ] [ ] [ ] [ ] */
assert(isEquals(
array(
array('number'=>2, 'values'=>array('a','b','a','b')),
array('number'=>1, 'values'=>array('c')),
array('number'=>1, 'values'=>array('d')),
),
splitToPatterns(['a','b','a','b','a','b','a','b','c','d'])
));
/* 'x', 'x', 'y', 'x', 'b', 'x', 'b', 'a' */
/* // [ ] [ ] [ ] [ ] [ ] [ ] */
assert(isEquals(
array(
array('number'=>2, 'values'=>array('x')),
array('number'=>1, 'values'=>array('y')),
array('number'=>2, 'values'=>array('x','b')),
array('number'=>1, 'values'=>array('a')),
),
splitToPatterns(['x','x','y','x','b','x','b','a'])
));
// one, two, one, two, one, two, one, two
// [ ] [ ]
assert(isEquals(
array(
array('number'=>2, 'values'=>array('one', 'two', 'one', 'two')),
),
splitToPatterns(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'])
));
// 'one', 'two', 'one', 'two', 'three', 'four', 'one', 'two', 'three', 'four'
// [ ] [ ] [ ] [ ]
assert(isEquals(
array(
array('number'=>1, 'values'=>array('one')),
array('number'=>1, 'values'=>array('two')),
array('number'=>2, 'values'=>array('one','two','three','four')),
),
splitToPatterns(['one', 'two', 'one', 'two', 'three', 'four', 'one', 'two', 'three','four'])
));
/* 'a', 'a', 'b', 'a', 'b', 'a', 'b', 'a', 'b', 'c', */
/* [ ] [ ] [ ] [ ] */
assert(isEquals(
array(
array('number'=>1, 'values'=>array('a')),
array('number'=>2, 'values'=>array('a','b','a','b')),
array('number'=>1, 'values'=>array('c')),
),
splitToPatterns(['a','a','b','a','b','a','b','a','b','c'])
));
/* 'a', 'b', 'a', 'b', 'c', 'd', 'a', 'b', 'a', 'b', 'a', 'b' */
// [ ] [ ] [ ] [ ] [ ] [ ] [ ]
assert(isEquals(
array(
array('number'=>2, 'values'=>array('a', 'b')),
array('number'=>1, 'values'=>array('c')),
array('number'=>1, 'values'=>array('d')),
array('number'=>3, 'values'=>array('a','b')),
),
splitToPatterns(['a', 'b', 'a', 'b', 'c', 'd', 'a', 'b', 'a', 'b', 'a', 'b'])
));
/* 'a', 'c', 'd', 'a', 'b', 'a', 'b', 'a', 'b', 'a', 'b', 'c', */
/* [ ] [ ] [ ] [ ] [ ] [ ] */
assert(isEquals(
array(
array('number'=>1, 'values'=>array('a')),
array('number'=>2, 'values'=>array('a','b','a','b')),
array('number'=>1, 'values'=>array('c')),
),
splitToPatterns(['a','a','b','a','b','a','b','a','b','c'])
));
试验:
{{1}}
答案 5 :(得分:2)
我现在从这开始,但最后我的大脑燃烧,我不知道从哪里开始比较阵列...享受!
$array = array(
'x', 'x', 'y', 'x', 'b', 'x', 'b', 'a'
//[.......] [.] [[......] [......]] [.]
);
$arrayCount = count($array);
$res = array();
for($i = 0; $i < $arrayCount; $i++) {
for($j = 1; $j < $arrayCount; $j++) {
$res[$i][] = array_slice($array, $i, $j);
}
}
//echo '<pre>';
//var_dump($res);
//echo '</pre>';
//
//die;
$resCount = count($res);
$oneResCount = count($res[0]);
答案 6 :(得分:2)
首先创建一个函数,它将在数组中为给定的组数组找到可能的组匹配项,从数组中的特定索引开始,并返回找到的匹配项数。
function findGroupMatch($group, $array, $startFrom) {
$match = 0;
while($group == array_slice($array, $startFrom, count($group))) {
$match++;
$startFrom += count($group);
}
return $match;
}
现在,我们需要遍历每个项目以查找可能的组,然后将其发送到findGroupMatch()
函数,以检查下一个项目中是否存在该组的匹配项。找到可能组的技巧是找到与之前任何项匹配的项。如果是这样,我们会找到一个可能的组,从匹配的项目开始采用所有以前的项目。否则,我们只增加不匹配项目列表,最后我们将所有不匹配的项目作为单个项目组输入。 (在给定的示例中,我们有a, b, c, d, a....
当我们在数组中找到第二个a
时,它与之前的a
匹配,因此,我们认为a, b, c, d
是一个可能的组并发送它将运行findGroupMatch()
,以检查我们可以在下一个项目中找到多少组。)
$array = array(
'a', 'b', 'c', 'd',
'a', 'b', 'c', 'd',
'c', 'd',
);
$unMatchedItems = array();
$totalCount = count($array);
$groupsArray = array();
for($i=0; $i < $totalCount; $i++) {
$item = $array[$i];
if(in_array($item, $unMatchedItems)) {
$matched_keys = array_keys($unMatchedItems, $item);
foreach($matched_keys as $key) {
$possibleGroup = array_slice($unMatchedItems, $key);
$matches = findGroupMatch($possibleGroup, $array, $i);
if ($matches) {
//Insert the items before group as single item group
if ($key > 0) {
for ($j = 0; $j < $key; $j++) {
$groupsArray[] = array('number' => 1, 'values' => array($unMatchedItems[$j]));
}
}
//Insert the group array
$groupsArray[] = array('number' => $matches + 1, 'values' => $possibleGroup); //number includes initial group also so $matches + 1
$i += (count($possibleGroup) * $matches) - 1; //skip the matched items from next iteration
//Empty the unmatched array to start with a new group search
$unMatchedItems = array();
break;
}
}
//If there was no matches, add the item to the unMatched group
if(!$matches) $unMatchedItems[] = $item;
} else {
$unMatchedItems[] = $item;
}
}
//Insert the remaining items as single item group
for($k=0; $k<count($unMatchedItems); $k++) {
$groupsArray[] = array('number' => 1, 'values' => array($unMatchedItems[$k]));
}
print_r($groupsArray);
结果如下:(检查此PHP Fiddle进行测试,并https://eval.in/507333进行另一次输入测试。)
Array
(
[0] => Array
(
[number] => 2
[values] => Array
(
[0] => a
[1] => b
[2] => c
[3] => d
)
)
[1] => Array
(
[number] => 1
[values] => Array
(
[0] => c
)
)
[2] => Array
(
[number] => 1
[values] => Array
(
[0] => d
)
)
)
答案 7 :(得分:2)
使用递归,第一个示例非常简单。 第二个示例......不那么容易。
以下示例仅适用于第一个示例,假设任何模式都不应包含两个相同的元素。这也将处理原始数组末尾的所有单个元素模式,并保持模式顺序(第一个模式出现)。
function find_pattern($input, &$result) {
$values = []; // currently processed elements
$pattern = ''; // the current element pattern
$dupe_found = false; // did we find a duplicate element?
// search the values for the first that matches a previous value
while ($next = array_shift($input)) {
// check if the element was already found
if (in_array($next, $values)) {
// re-add the value back into the input, since the next call needs it
array_unshift($input, $next);
// add the resulting pattern
add_pattern($pattern, $values, $result);
// find the next pattern with a recursive call
find_pattern($input, $result);
// a duplicate element was found!
$dupe_found = true;
// the rest of the values are handled by recursion, break the while loop
break;
} else {
// not already found, so store the element and keep going
$values[] = $next;
// use the element to produce a key for the result set
$pattern .= $next;
}
}
// if no duplicate was found, then each value should be an individual pattern
if (!$dupe_found) {
foreach ($values as $value) {
add_pattern($value, [$value], $result);
}
}
}
function add_pattern($pattern, $values, &$result) {
// increment the pattern count
$result[$pattern]['number'] = isset($result[$pattern]['number']) ?
result[$pattern]['number']+1 : 1;
// add the current pattern to the result, if not already done
if (!isset($result[$pattern]['values'])) {
$result[$pattern]['values'] = $values;
}
}
一个示例用法:
$input = [
'a', 'b', 'c', 'd',
'a', 'b', 'c', 'd',
'c', 'd'
];
$result = [];
find_pattern($input, $result);
echo "<pre>";
print_r($result);
echo "</pre>";
示例输出:
Array
(
[abcd] => Array
(
[number] => 2
[values] => Array
(
[0] => a
[1] => b
[2] => c
[3] => d
)
)
[c] => Array
(
[number] => 1
[values] => Array
(
[0] => c
)
)
[d] => Array
(
[number] => 1
[values] => Array
(
[0] => d
)
)
)
答案 8 :(得分:2)
你可以这样做:
<?php
$array = array(
'a', 'b', 'c', 'd',
'a', 'b', 'c', 'd',
'c', 'd'
);
// Call this function to get your patterns
function patternMatching(array $array) {
$patterns = array();
$belongsToPattern = array_fill(0, count($array), false);
// Find biggest patterns first
for ($size = (int) (count($array) / 2); $size > 0; $size--) {
// for each pattern: start at every possible point in the array
for($start=0; $start <= count($array) - $size; $start++) {
$p = findPattern($array, $start, $size);
if($p != null) {
/* Before we can save the pattern we need to check, if we've found a
* pattern that does not collide with patterns we've already found */
$hasConflict = false;
foreach($p["positions"] as $idx => $pos) {
$PatternConflicts = array_slice($belongsToPattern, $pos, $p["size"]);
$hasConflict = $hasConflict || in_array(true, $PatternConflicts);
}
if(!$hasConflict) {
/* Since we have found a pattern, we don't want to find more
* patterns for these positions */
foreach($p["positions"] as $idx => $pos) {
$replace = array_fill($pos, $p["size"], true);
$belongsToPattern = array_replace($belongsToPattern, $replace);
}
$patterns[] = $p;
// or only return number and values:
// $patterns[] = [ "number" => $p["number"], "values" => $p["values"]];
}
}
}
}
return $patterns;
}
function findPattern(array $haystack, $patternStart, $patternSize ) {
$size = count($haystack);
$patternCandidate = array_slice($haystack, $patternStart, $patternSize);
$patternCount = 1;
$patternPositions = [$patternStart];
for($i = $patternStart + $patternSize; $i <= $size - $patternSize; $i++) {
$patternCheck = array_slice($haystack, $i, $patternSize);
$diff = array_diff($patternCandidate, $patternCheck);
if(empty($diff)) {
$patternCount++;
$patternPositions[] = $i;
}
}
if($patternCount > 1 || $patternSize <= 1) {
return [
"number" => $patternCount,
"values" => $patternCandidate,
// Additional information needed for filtering, sorting, etc.
"positions" => $patternPositions,
"size" => $patternSize
];
} else {
return null;
}
}
$patterns = patternMatching($array);
print "<pre>";
print_r($patterns);
print "</pre>";
?>
代码可能远非速度最佳,但它应该为数组中的任何字符串序列执行您想要执行的操作。 patternMatching()
返回按模式大小递减的模式,并按第一次出现的方式递增(您可以使用['positions'][0]
作为排序条件来实现不同的顺序)。
答案 9 :(得分:1)
这应该这样做:
<?php
$array = array(
'x', 'y', 'x', 'y', 'a',
'ab', 'c', 'd',
'a', 'b', 'c', 'd',
'c', 'd', 'x', 'y', 'b',
'x', 'y', 'b', 'c', 'd'
);
// convert the array to a string
$string = '';
foreach ($array as $a) {
$l = strlen($a)-1;
$string .= ($l) ? str_replace('::',':',$a[0] . ':' . substr($a,1,$l-1) . ':' . $a[$l]) . '-' : $a . '-';
}
// find patterns
preg_match_all('/(?=((.+)(?:.*?\2)+))/s', $string, $matches, PREG_SET_ORDER);
foreach ($matches as $m) {
$temp = str_replace('--','-',$m[2].'-');
$patterns[] = ($temp[0]==='-') ? substr($temp,1) : $temp;
}
// remove empty values and duplicates
$patterns = array_keys(array_flip(array_filter($patterns)));
// sort patterns
foreach ($patterns as $p) {
$sorted[$p] = strlen($p);
}
arsort($sorted);
// find double or more occurences
$stringClone = $string;
foreach ($sorted as $s=>$n) {
$nA = substr_count($stringClone,':'.$s);
$nZ = substr_count($stringClone,$s.':');
$number = substr_count($stringClone,$s);
$sub = explode('-',substr($stringClone,strpos($stringClone,$s),$n-1));
$values = $sub;
if($nA>0 || $nZ>0){
$numberAdjusted = $number - $nA - $nZ;
if($numberAdjusted > 1) {
$temp = '';
while($n--){
$temp .= '#';
}
$position = strpos(str_replace(':'.$s,':'.$temp,str_replace($s.':',$temp.':',$string)),$s);
$stringClone = str_replace(':'.$s,':'.$temp,$stringClone);
$stringClone = str_replace($s.':',$temp.':',$stringClone);
$result['p'.sprintf('%09d', $position)] = array('number'=>$numberAdjusted,'values'=>$values);
$stringClone = str_replace($s,'',$stringClone);
$stringClone = str_replace($temp,$s,$stringClone);
}
} else if($number>1){
$position = strpos($string,$s);
$result['p'.sprintf('%09d', $position)] = array('number'=>$number,'values'=>$values);
$stringClone = str_replace($s,'',$stringClone);
}
}
// add the remaining items
$remaining = array_flip(explode('-',substr($stringClone,0,-1)));
foreach ($remaining as $r=>$n) {
$position = strpos($string,$r);
$result['p'.sprintf('%09d', $position)] = array('number'=>1,'values'=>str_replace(':','',$r));
}
// sort results
ksort($result);
$result = array_values($result);
print_r($result);
?>
工作示例here。