如何使用PHP查找字符串中的字符序列模式?

时间:2013-03-10 18:12:26

标签: php regex arrays string

假设我有随机的文本块:

EAMoAAQAABwEBAAAAAAAAAAAAAAABAgMFBgcIBAkBAQABBQEBAAAAAAAAAAAAAAAGAgMEBQcBCBAAAQMDAgMEBQcIBQgGCwEAAQACAxEEBSEGMRIHQVFhE3GBIhQIkaGxwTJCI9FScoKSojMV8GLCUxbhstKDo7M0ZHOTJEQlF/HiQ2PDVHSExEUmGBEBAAIBAgMDCAgCCgMBAQEAAAECAxEEITEFQRIGUWFxgZGhIhPwscHRMlIUB0Jy4fGCkqLCI1MVFrLSQ2IzF//aAAwDAQACEQMRAD8A7+QEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEEDwXkzpxHgusxi7NrnXF3G0NBLhzAkAeAqVH934r6bt57uTPSJ8ne1n2Rqycezy35VlRttwYu5DXNlLOcczOdpHM3hUUqtLs/wBxulZonXJ8vjp8caa+eOa5k6flrPLVcIbm3n/gytf4NcCVKtj1XbbqNcOSuT+W0W+pi3x2rzjRWWxUCAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAggV5It2Uy8GNYAWmW6kr5MDftO8T3BRXxR4s2/SccTb48lvw0jnPnn8tfP6o1Ze02ds08OERzlid+/P5Orp5BHEeFuxxa0Dxpx9a+fOu+Iup9Tmfm30p+Ss92vr/N6bat/t67fDyjWfLLG79pt45YpAA8NdUAg9ngolTFNbedtqWi0avVicv5bLKFr2kSRltHaahrXCnylZcd6k208rDy4ItxlkUr5+XnZE1zxq0h3KfUQqv1GWsxeI0tHKY1rPtjRgVivKZU7HebrS491ybX+TWnO7V7PEn7w+f0rpPhb9zdxt7Rj3szkx/n/AI6+n88f4vTyebno8Wr3qTGvun7mawSxzsbNC4Pje0Oa9pqCD2grv+3z0zUi9Ji1bRrEx2wjtqzWdJ5wqq8pEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQU

说明:

patternABC >= 2 characters = groupABC IF groupABC occurs more than once
groupABC + (groupABC)n = sequence WHERE n >= 1 AND sequence > 6 characters

**序列需要> 6个字符,以便进行评估

BREAKDOWN:

如何找到按顺序发生的重复模式?

QEBAQEBAQEBAQEBAQEBAQEBA

我还想计算每组重复的次数:

QEBA QEBA QEBA QEBA QEBA QEBA = 6

此序列必须> 6个字符,以便进行评估:

NO GOOD: AA AA AA
GOOD: AA AA AA AA

如果输出可以存储在关联数组中,并且删除了重复的条目,那将是理想的:

QEBA => 6, AA => 4, QEBA => 3, AA => 8, (QEBA => 6)<- REMOVE

有没有人有时间&amp;解决这个问题的倾向? 如果你这样做,你会摇滚!

2 个答案:

答案 0 :(得分:2)

$str = 'EAMoAAQAABwEBAAAAAAAAAAAAAAABAgMFBgcIBAkBAQABBQEBAAAAAAAAAAAAAAAGAgMEBQcBCBAAAQMDAgMEBQcIBQgGCwEAAQACAxEEBSEGMRIHQVFhE3GBIhQIkaGxwTJCI9FScoKSojMV8GLCUxbhstKDo7M0ZHOTJEQlF/HiQ2PDVHSExEUmGBEBAAIBAgMDCAgCCgMBAQEAAAECAxEEITEFQRIGUWFxgZGhIhPwscHRMlIUB0Jy4fGCkqLCI1MVFrLSQ2IzF//aAAwDAQACEQMRAD8A7+QEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEEDwXkzpxHgusxi7NrnXF3G0NBLhzAkAeAqVH934r6bt57uTPSJ8ne1n2Rqycezy35VlRttwYu5DXNlLOcczOdpHM3hUUqtLs/wBxulZonXJ8vjp8caa+eOa5k6flrPLVcIbm3n/gytf4NcCVKtj1XbbqNcOSuT+W0W+pi3x2rzjRWWxUCAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAggV5It2Uy8GNYAWmW6kr5MDftO8T3BRXxR4s2/SccTb48lvw0jnPnn8tfP6o1Ze02ds08OERzlid+/P5Orp5BHEeFuxxa0Dxpx9a+fOu+Iup9Tmfm30p+Ss92vr/N6bat/t67fDyjWfLLG79pt45YpAA8NdUAg9ngolTFNbedtqWi0avVicv5bLKFr2kSRltHaahrXCnylZcd6k208rDy4ItxlkUr5+XnZE1zxq0h3KfUQqv1GWsxeI0tHKY1rPtjRgVivKZU7HebrS491ybX+TWnO7V7PEn7w+f0rpPhb9zdxt7Rj3szkx/n/AI6+n88f4vTyebno8Wr3qTGvun7mawSxzsbNC4Pje0Oa9pqCD2grv+3z0zUi9Ji1bRrEx2wjtqzWdJ5wqq8pEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQU';

preg_match_all( '/(\S{2,}?)\1+/', $str, $matches );

// Remove duplicates
$matches[0] = array_unique( $matches[0] ); 

foreach ( $matches[0] as $key => $value ) {
    if ( strlen( $value ) > 6 ) {
        $repeated = $matches[1][$key];
        $results[] = array( $repeated => count( explode( $repeated, $value ) ) - 1 );
    }    
}

print_r($results); 

/*
[AA] => 7
[QEBA] => 93
[CAgI] => 18
[EBAQ] => 18
*/

以上假设序列由非空格字符组成。

答案 1 :(得分:1)

使用preg_match_all('/(?:(.{6,})\1)/',$inputText,$sequences)获取序列 (注意:序列将保存在$sequences
解释了RegEx演示:http://regex101.com/r/rW4nE2

使用array_unique()删除重复项。

循环遍历每个序列并且:
获取preg_match_all('/(.+?)(\1)(\1)?/',$sequence,$groups)的群组 解释了RegEx演示:http://regex101.com/r/pC3pB7

如果需要,请使用count()