查找字符串中多个子序列的边界

时间:2015-12-23 08:25:41

标签: regex string algorithm subsequence

给定由n个字符组合组成的长度为A B D的字符串。

前1:AAAABAAAADADDDDADDDBBBBBBDDDDA

Threshold的{​​p> x,给定的子字符串可以包含最大长度x的任何其他连续子字符串

Ex-2:对于Ex-1中A的子序列,AAAABAAAADA是合法子字符串,其边界为x = 2(1,11)。

同样,我想分别提取AD的子字符串,忽略主字符串中的B。主字符串中可以有每种类型的许多子字符串。

模型输出:

Type Boundaries
A    1,11
D    12,20
D    26,29 

如果大于阈值的距离打破了字符串,我通过查找A之间的距离来实现一种无效的非算法方法。我必须单独为AD运行此操作。这会导致边界区域重叠。

我可以有更好的方法解决这个问题吗?

修改-1

合法子字符串可以是任意长度,但不应被大于阈值x的其他子字符串污染。这意味着在搜索A的子字符串时,它不应该有其他字符BD连续大于阈值。

在搜索x = 2AAABBAAAA, AABDAAAA有效,但不是AADBDAAA, AABBBAAA。同样,搜索D(AB将是污染者。)

EDIT-2 使用" Pham Trung"答案

代码:

start = 0
lastA = -1
lastD = -1
x = 2

arr = ["A", "A", "A", "A", "B", "A", "A", "A", "A", "D", "A", "D", "D", "D", "D", "A", "D", "D", "D", "B", "B", "B", "B", "B", "B", "D", "D", "D", "D", "A"]

for i in range(0, len(arr)):
    if(arr[i] == 'A'):
        if(lastA != -1 and i - lastA > x):
            print("A", start + 1, lastA + 1)
            start = i
        lastA = i
    elif(arr[i] == 'D'):
        if(lastD != -1 and i - lastD > x):
            print("D", start + 1, lastD + 1)
            start = i
        lastD = i

输出:

A 1 11
D 16 19
A 26 16

代码无法在1st子字符串后提取子字符串。

1 个答案:

答案 0 :(得分:1)

所以,这里有一些针对您问题的建议:

由于我们的字符串中只有三种字符,因此很容易跟踪这些字符的最后位置。

从字符串的开头开始,跟踪当前字符与其最后位置之间的距离,如果它大于阈值,则将其中断并从那里开始新的子字符串。

伪代码:

int start = 0;
int lastA = -1;
int lastD = -1;
for(int i = 0; i < input.length(); i++)
    if(input.charAt(i) == 'A'){
       if(lastA != -1 && i - lastA > x){
           create a substring from start to i - 1; 
           start = i; //Update the new start for the next substring
           lastD = -1;//Reset count for D
       }
       lastA = i;
    }else if(input.charAt(i) == 'D'){
       //Do similar to what we do for character A
    } 
}
create a substring from start to end of the string; //We need to add the last substring.

更新python代码:

start = 0
lastA = -1
lastD = -1
x = 2

arr = ["A", "A", "A", "A", "B", "A", "A", "A", "A", "D", "A", "D", "D",    "D","D", "A", "D", "D", "D", "B", "B", "B", "B", "B", "B", "D", "D", "D", "D", "A"]

for i in range(0, len(arr)):
    if(arr[i] == 'A'):
        if(lastA != -1 and i - lastA > x):
            print("A", start + 1, lastA + 1)
            start = lastA + 1
            while(start < len(arr) and arr[start] == 'B'):
                start = start + 1
            lastD = -1 
        lastA = i
    elif(arr[i] == 'D'):
        if(lastD != -1 and i - lastD > x):
            print("D", start + 1, lastD + 1)
            start = lastD + 1
            while(start < len(arr) and arr[start] == 'B'):
                start = start + 1
            lastA = -1
        lastD = i
while(start < len(arr) and arr[start] == 'B'):
    start = start + 1 
if(start < len(arr)):   
   print("A or D", start + 1, len(arr))