我有一个字符串CCAATA CCGT
,我试图获得连续子序列的固定长度n。然后,我想得到这样的东西:
该字符串中每个子序列的索引。 0-3,1-4,2-5等
0 thru 3 : CCAA
1 thru 4 : CAAT
2 thru 5 : AATA
3 thru 6 : ATAC
4 thru 7 : TACC
5 thru 8 : ACCG
6 thru 9 : CCGT
列表大小为7.这里,我循环通过列表并获得索引& lastIndexOf。之后,3 thru 6 : ATAC
,我得到了
线程中的异常" main" java.lang.IndexOutOfBoundsException:Index:7,Size:7
for (int i = 0; i < list.size(); i++) {
System.out.println(ss.indexOf(list.get(i))
+ " thru " + ss.lastIndexOf(list.get(i + n - 1)) + " : "
+ list.get(i));
演示:
import java.util.ArrayList;
public class Subsequences {
public static void main(String[] args) {
String s = "CCAATA CCGT";
ArrayList<String> list = new ArrayList<String>(); // list of subsequence
int n = 4; // subsequences of length
String ss = s.replaceAll("\\s+", "");
String substr = null;
for (int i = 0; i <= ss.length() - n; i++) {
substr = ss.substring(i, i + n);
list.add(substr);
}
for (int i = 0; i < list.size(); i++) {
System.out.println(ss.indexOf(list.get(i))
+ " thru " + ss.lastIndexOf(list.get(i + n - 1)) + " : "
+ list.get(i));
}
}
}
任何提示?
答案 0 :(得分:1)
我相信您的问题出在list.get(i + n - 1)
。您目前正在进行迭代,以使每个子序列的 start 的范围从0
到list.size() - 1
。有意义的最后一个子序列是位置n
到list.size() - n
的{{1}}个字符。
list.size() - 1
答案 1 :(得分:1)
删除所有空格,循环:
String data = "CCAATA CCGT";
String replaced = data.replaceAll("\\s", "");
for (int i = 0; i < replaced.length() - 4 + 1; i++) {
System.out.println(replaced.subSequence(i, i + 4));
}
输出:
CCAA
CAAT
AATA
ATAC
TACC
ACCG
CCGT
答案 2 :(得分:1)
您不需要将n
添加到lastIndexOf
,因为您已将substring
分隔为4. List
中的每个条目都包含4个字符。将索引检查更改为此
(ss.lastIndexOf(list.get(i)) + n - 1)
最后它看起来像这个
for (int i = 0; i < list.size(); i++) {
System.out.println(ss.indexOf(list.get(i))
+ " thru " + (ss.lastIndexOf(list.get(i)) + n - 1) + " : "
+ list.get(i));
}
输出:
0 thru 3 : CCAA
1 thru 4 : CAAT
2 thru 5 : AATA
3 thru 6 : ATAC
4 thru 7 : TACC
5 thru 8 : ACCG
6 thru 9 : CCGT
答案 3 :(得分:0)
在你的循环中
for (int i = 0; i < list.size(); i++) {
System.out.println(ss.indexOf(list.get(i))
+ " thru " + ss.lastIndexOf(list.get(i + n - 1))
+ " : " + list.get(i));
}
当您执行list.get(i + n - 1)
且i
为4时,上瘾的结果将是4 + 4 - 1 = 7,并且您无法获得列表的成员您的list.size()
的索引相同或更大,因此系统会抛出异常
要获得您期望的结果,您可以执行以下操作:
import java.util.ArrayList;
public class Subsequences {
public static void main(String[] args) {
String s = "CCAATA CCGT";
ArrayList<String> list = new ArrayList<String>(); // list of subsequence
int n = 4; // subsequences of length
String ss = s.replaceAll("\\s+", "");
String substr = null;
for (int i = 0; i <= ss.length() - n; i++) {
substr = ss.substring(i, i + n);
list.add(substr);
}
// --------Here the edits-------
for (int i = 0; i < list.size(); i++)
System.println(i + " thru " + (i+n-1) + " : " + list.get(i))
// -----------------------------
}
}
答案 4 :(得分:0)
您也可以使用简单的正则表达式执行此操作。删除空格并运行此正则表达式:
(?=(.{4}))
样品:
package com.see;
import java.util.ArrayList;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexTest {
private static final String TEST_STR = "CCAATA CCGT";
public ArrayList<String> getMatchedStrings(String input) {
ArrayList<String> matches = new ArrayList<String>();
input = input.replaceAll("\\s", "");
Pattern pattern = Pattern.compile("(?=(.{4}))");
Matcher matcher = pattern.matcher(input);
while (matcher.find())
matches.add(matcher.group(1));
return matches;
}
public static void main(String[] args) {
RegexTest rt = new RegexTest();
for (String string : rt.getMatchedStrings(TEST_STR)) {
System.out.println(string);
}
}
}