我有一个方法,它接受一个字符串的ArrayList,列表中的每个元素等于变体:
>AX018718 Equine influenza virus H3N8 // 4 (HA)
CAAAAGCAGGGTGACAAAAACATGATGGATTCCAACACTGTGTCAAGCTTTCAGGTAGACTGTTTTCTTT
GGCATGTCCGCAAACGATTTGCAGACCAAGAACTGGGTGATGCCCCATTCCTTGACCGGCTTCGCCGAGA
这个方法被分解为Acc,在这种情况下是AX018718,而seq是在Acc
之后的两行。然后由另一个名为pal的字符串ArrayList检查,以查看子字符串是否匹配[AAAATTTT,AAACGTTT,AAATATATTT]
我能够将第一个列表的不同元素的所有匹配输出为:
AATATATT in organism: AX225014 Was found in position: 15 and at 15
AATATT in organism: AX225014 Was found in position: 1432 and at 1432
AATATT in organism: AX225016 Was found in position: 1404 and at 1404
AATT in organism: AX225016 Was found in position: 169 and at 2205
如果所有的Acc匹配一个朋友,是否可以检查所有输出的信息?
在上面的例子中,想要的输出是:
AATATT was found in all of the Acc.
我的工作代码:
public static ArrayList<String> PB2Scan(ArrayList<String> Pal) throws FileNotFoundException, IOException
{
ArrayList<String> PalindromesSpotted = new ArrayList<String>();
File file = new File("IAV_PB2_32640.txt");
Scanner sc = new Scanner(file);
sc.useDelimiter(">");
//initializes the ArrayList
ArrayList<String> Gene1 = new ArrayList<String>();
//initializes the writer
FileWriter fileWriter = new FileWriter("PB2out");
PrintWriter printwriter = new PrintWriter(fileWriter);
//Loads the Array List
while(sc.hasNext()) Gene1.add(sc.next());
for(int i = 0; i < Gene1.size(); i++)
{
//Acc breaks down the title so the element:
//>AX225014 Equine influenza virus H3N8 // 1 (PB2)
//ATGAAGACAACCATTATTTTGATACTACTGACCCATTGGGTCTACAGTCAAAACCCAACCAGTGGCAACA
//GGCATGTCCGCAAACGATTTGCAGACCAAGAACTGGGTGATGCCCCATTCCTTGACCGGCTTCGCCGAGA
//comes out as AX225014
String Acc = Accession(Gene1.get(i));
//seq takes the same element as above and returns only
//ATGAAGACAACCATTATTTTGATACTACTGACCCATTGGGTCTACAGTCAAAACCCAACCAGTGGCAACA
//GGCATGTCCGCAAACGATTTGCAGACCAAGAACTGGGTGATGCCCCATTCCTTGACCGGCTTCGCCGAGA
String seq = trimHeader(Gene1.get(i));
for(int x = 0; x<Pal.size(); x++)
{
if(seq.contains(Pal.get(x))){
String match = (Pal.get(x) + " in organism: " + Acc + " Was found in position: "+ seq.indexOf(Pal.get(x)) + " and at " +seq.lastIndexOf(Pal.get(x)));
printwriter.println(match);
PalindromesSpotted.add(match);
}
}
}
Collections.sort(PalindromesSpotted);
return PalindromesSpotted;
}
答案 0 :(得分:1)
你应该创建一个{P}包含Pals作为键的Map<String, List<String>>
和包含它们作为值的Accs。
Map<String, List<String>> result = new HashMap<>();
for (String gene : Gene1) {
List<String> list = new ArrayList<>();
result.put(gene, list);
for (String pal : Pal) {
if (acc.contains(trimHeader(gene))) {
list.add(pal);
}
}
}
现在你有了一个Map,你可以查询每个Gene包含的Pals:
List<String> containedPals = result.get(gene);
对于像这样的函数来说,这是一个非常合理的结果。你之后做的事情(即写入文件)应该更好地在另一个函数中完成(调用这个函数)。
所以,这可能就是你想要做的事情:
List<String> genes = loadGenes(geneFile);
List<String> pals = loadPal(palFile);
Map<String, List<String>> genesToContainedPal = methodAbove(genes, pals);
switch (resultTyp) {
// ...
}
答案 1 :(得分:1)
首先,您的代码不会写入任何文件来记录结果,因为您关闭您的作者或至少刷新 PrintWriter的即可。事实上,你也不会关闭你的读者。你真的应该关闭你的读者和作家以释放资源。值得深思。
您可以使 PB2Scan()方法返回现在的简单结果列表,或者返回包含相同Pal(或者两者)的结果列表,或者两者都返回记录简单结果列表,并在该列表的末尾列出包含相同Pal(也将记录)的acc列表。
PB2Scan()方法的一些额外代码和附加整数参数可以执行此操作。对于附加参数,您可能希望添加如下内容:
public static ArrayList<String> PB2Scan(ArrayList<String> Pal, int resultType)
throws FileNotFoundException, IOException
{ .... }
整数 resultType 参数将采用0到2之间的三个整数值之一:
您还应该将该文件作为 PB2Scan()方法的参数进行读取,因为下一次该文件很容易成为另一个名称。这使得该方法更加通用,而不是如果文件的名称是硬编码的。
public static ArrayList<String> PB2Scan(String filePath, ArrayList<String> Pal, int resultType)
throws FileNotFoundException, IOException { .... }
该方法总是可以编写相同的输出文件,因为它最适合它来自哪种方法。
使用上述概念而不是写入输出文件( PB2Out.txt ),因为正在创建 PalindromesSpotted ArrayList,我认为最好在你的文件之后编写文件ArrayList或ArrayLists已完成。要做到这一点,另一种方法( writeListToFile())最适合执行任务。要找出是否有任何相同的Pal与其他Acc匹配,再次使用另一种方法( getPalMatches())执行该任务也是个好主意。
由于在任何给定的 Seq 中多个的 Pal 的索引位置未正确报告,我还提供了另一种方法( findSubstringIndexes())快速处理该任务。
应该注意的是,下面的代码假定从 trimHeader()方法获取的 Seq 是一个单独的字符串,其中没有换行符。< / p>
重新编写的 PB2Scan()方法和其他上述方法如下:
PB2Scan()方法:
public static ArrayList<String> PB2Scan(String filePath, ArrayList<String> Pal, int resultType)
throws FileNotFoundException, IOException {
// Make sure the supplied result type is either
// 0, 1, or 2. If not then default to 0.
if (resultType < 0 || resultType > 2) {
resultType = 0;
}
ArrayList<String> PalindromesSpotted = new ArrayList<>();
File file = new File(filePath);
Scanner sc = new Scanner(file);
sc.useDelimiter(">");
//initializes the ArrayList
ArrayList<String> Gene1 = new ArrayList<>();
//Loads the Array List
while (sc.hasNext()) {
Gene1.add(sc.next());
}
sc.close(); // Close the read in text file.
for (int i = 0; i < Gene1.size(); i++) {
//Acc breaks down the title so the element:
//>AX225014 Equine influenza virus H3N8 // 1 (PB2)
//ATGAAGACAACCATTATTTTGATACTACTGACCCATTGGGTCTACAGTCAAAACCCAACCAGTGGCAACA
//GGCATGTCCGCAAACGATTTGCAGACCAAGAACTGGGTGATGCCCCATTCCTTGACCGGCTTCGCCGAGA
//comes out as AX225014
String Acc = Accession(Gene1.get(i));
//seq takes the same element as above and returns only
//ATGAAGACAACCATTATTTTGATACTACTGACCCATTGGGTCTACAGTCAAAACCCAACCAGTGGCAACA
//GGCATGTCCGCAAACGATTTGCAGACCAAGAACTGGGTGATGCCCCATTCCTTGACCGGCTTCGCCGAGA
String seq = trimHeader(Gene1.get(i));
for (int x = 0; x < Pal.size(); x++) {
if (seq.contains(Pal.get(x))) {
String match = Pal.get(x) + " in organism: " + Acc +
" Was found in position(s): " +
findSubstringIndexes(seq, Pal.get(x));
PalindromesSpotted.add(match);
}
}
}
// If there is nothing to work with get outta here.
if (PalindromesSpotted.isEmpty()) {
return PalindromesSpotted;
}
// Sort the ArrayList
Collections.sort(PalindromesSpotted);
// Another ArrayList for matching Pal's to Acc's
ArrayList<String> accMatchingPal = new ArrayList<>();
switch (resultType) {
case 0: // if resultType is 0 is supplied
writeListToFile("PB2Out.txt", PalindromesSpotted);
return PalindromesSpotted;
case 1: // if resultType is 1 is supplied
accMatchingPal = getPalMatches(PalindromesSpotted);
writeListToFile("PB2Out.txt", accMatchingPal);
return accMatchingPal;
default: // if resultType is 2 is supplied
accMatchingPal = getPalMatches(PalindromesSpotted);
ArrayList<String> fullList = new ArrayList<>();
fullList.addAll(PalindromesSpotted);
// Create a Underline made of = signs in the list.
fullList.add(String.join("", Collections.nCopies(70, "=")));
fullList.addAll(accMatchingPal);
writeListToFile("PB2Out.txt", fullList);
return fullList;
}
}
findSubstringIndexes()方法:
private static String findSubstringIndexes(String inputString, String stringToFind){
String indexes = "";
int index = inputString.indexOf(stringToFind);
while (index >= 0){
indexes+= (indexes.equals("")) ? String.valueOf(index) : ", " + String.valueOf(index);
index = inputString.indexOf(stringToFind, index + stringToFind.length()) ;
}
return indexes;
}
getPalMatches()方法:
private static ArrayList<String> getPalMatches(ArrayList<String> Palindromes) {
ArrayList<String> accMatching = new ArrayList<>();
for (int i = 0; i < Palindromes.size(); i++) {
String matches = "";
String[] split1 = Palindromes.get(i).split("\\s+");
String pal1 = split1[0];
// Make sure the current Pal hasn't already been listed.
boolean alreadyListed = false;
for (int there = 0; there < accMatching.size(); there++) {
String[] th = accMatching.get(there).split("\\s+");
if (th[0].equals(pal1)) {
alreadyListed = true;
break;
}
}
if (alreadyListed) { continue; }
for (int j = 0; j < Palindromes.size(); j++) {
String[] split2 = Palindromes.get(j).split("\\s+");
String pal2 = split2[0];
if (pal1.equals(pal2)) {
// Using Ternary Operator to build the matches string
matches+= (matches.equals("")) ? pal1 + " was found in the following Accessions: "
+ split2[3] : ", " + split2[3];
}
}
if (!matches.equals("")) {
accMatching.add(matches);
}
}
return accMatching;
}
writeListToFile()方法:
private static void writeListToFile(String filePath, ArrayList<String> list, boolean... appendToFile) {
boolean appendFile = false;
if (appendToFile.length > 0) { appendFile = appendToFile[0]; }
try {
try (BufferedWriter bw = new BufferedWriter(new FileWriter(filePath, appendFile))) {
for (int i = 0; i < list.size(); i++) {
bw.append(list.get(i) + System.lineSeparator());
}
}
} catch (IOException ex) {
ex.printStackTrace();
}
}