Java检查列表中的元素是否出现在所有实例中

时间:2017-12-16 20:46:43

标签: java arraylist substring

我有一个方法,它接受一个字符串的ArrayList,列表中的每个元素等于变体:

>AX018718 Equine influenza virus H3N8 // 4 (HA)
CAAAAGCAGGGTGACAAAAACATGATGGATTCCAACACTGTGTCAAGCTTTCAGGTAGACTGTTTTCTTT
GGCATGTCCGCAAACGATTTGCAGACCAAGAACTGGGTGATGCCCCATTCCTTGACCGGCTTCGCCGAGA

这个方法被分解为Acc,在这种情况下是AX018718,而seq是在Acc

之后的两行。

然后由另一个名为pal的字符串ArrayList检查,以查看子字符串是否匹配[AAAATTTT,AAACGTTT,AAATATATTT]

我能够将第一个列表的不同元素的所有匹配输出为:

AATATATT in organism: AX225014 Was found in position: 15 and at 15
AATATT in organism: AX225014 Was found in position: 1432 and at 1432
AATATT in organism: AX225016 Was found in position: 1404 and at 1404
AATT in organism: AX225016 Was found in position: 169 and at 2205

如果所有的Acc匹配一个朋友,是否可以检查所有输出的信息?

在上面的例子中,想要的输出是:

AATATT was found in all of the Acc.

我的工作代码:

public static ArrayList<String> PB2Scan(ArrayList<String> Pal) throws FileNotFoundException, IOException
{
    ArrayList<String> PalindromesSpotted  = new ArrayList<String>();

    File file = new File("IAV_PB2_32640.txt");
    Scanner sc = new Scanner(file);
    sc.useDelimiter(">");
    //initializes the ArrayList
    ArrayList<String> Gene1 = new ArrayList<String>();
    //initializes the writer
    FileWriter fileWriter = new FileWriter("PB2out");
    PrintWriter printwriter = new PrintWriter(fileWriter);
    //Loads the Array List
    while(sc.hasNext()) Gene1.add(sc.next());
    for(int i = 0; i < Gene1.size(); i++) 
    {
    //Acc breaks down the title so the element:
        //>AX225014 Equine influenza virus H3N8 // 1 (PB2)
        //ATGAAGACAACCATTATTTTGATACTACTGACCCATTGGGTCTACAGTCAAAACCCAACCAGTGGCAACA
        //GGCATGTCCGCAAACGATTTGCAGACCAAGAACTGGGTGATGCCCCATTCCTTGACCGGCTTCGCCGAGA
        //comes out as AX225014
    String Acc = Accession(Gene1.get(i));
    //seq takes the same element as above and returns only
    //ATGAAGACAACCATTATTTTGATACTACTGACCCATTGGGTCTACAGTCAAAACCCAACCAGTGGCAACA
    //GGCATGTCCGCAAACGATTTGCAGACCAAGAACTGGGTGATGCCCCATTCCTTGACCGGCTTCGCCGAGA
    String seq = trimHeader(Gene1.get(i));
        for(int x = 0; x<Pal.size(); x++) 
        {
        if(seq.contains(Pal.get(x))){
        String match = (Pal.get(x) + " in organism: " + Acc + " Was found in position: "+ seq.indexOf(Pal.get(x)) + " and at " +seq.lastIndexOf(Pal.get(x)));
        printwriter.println(match);
        PalindromesSpotted.add(match);
        }
        }
    }
    Collections.sort(PalindromesSpotted);
return PalindromesSpotted;
}

2 个答案:

答案 0 :(得分:1)

你应该创建一个{P}包含Pals作为键的Map<String, List<String>>和包含它们作为值的Accs。

Map<String, List<String>> result = new HashMap<>();
for (String gene : Gene1) {
    List<String> list = new ArrayList<>();
    result.put(gene, list);
    for (String pal : Pal) {
        if (acc.contains(trimHeader(gene))) {
            list.add(pal);
        }
    }
}

现在你有了一个Map,你可以查询每个Gene包含的Pals:

List<String> containedPals = result.get(gene);

对于像这样的函数来说,这是一个非常合理的结果。你之后做的事情(即写入文件)应该更好地在另一个函数中完成(调用这个函数)。

所以,这可能就是你想要做的事情:

List<String> genes = loadGenes(geneFile);
List<String> pals = loadPal(palFile);
Map<String, List<String>> genesToContainedPal = methodAbove(genes, pals);
switch (resultTyp) {
    // ...
}

答案 1 :(得分:1)

首先,您的代码不会写入任何文件来记录结果,因为您关闭您的作者或至少刷新 PrintWriter的即可。事实上,你也不会关闭你的读者。你真的应该关闭你的读者和作家以释放资源。值得深思。

您可以使 PB2Scan()方法返回现在的简单结果列表,或者返回包含相同Pal(或者两者)的结果列表,或者两者都返回记录简单结果列表,并在该列表的末尾列出包含相同Pal(也将记录)的acc列表。

PB2Scan()方法的一些额外代码和附加整数参数可以执行此操作。对于附加参数,您可能希望添加如下内容:

public static ArrayList<String> PB2Scan(ArrayList<String> Pal, int resultType) 
                                throws FileNotFoundException, IOException
{ .... }

整数 resultType 参数将采用0到2之间的三个整数值之一:

  • 0 - 现在代码的简单结果列表;
  • 1 - Acc匹配Pal的;
  • 2 - 简单的结果列表和在结果列表末尾匹配Pal的Acc。

您还应该将该文件作为 PB2Scan()方法的参数进行读取,因为下一次该文件很容易成为另一个名称。这使得该方法更加通用,而不是如果文件的名称是硬编码的。

public static ArrayList<String> PB2Scan(String filePath, ArrayList<String> Pal, int resultType) 
                                throws FileNotFoundException, IOException { .... }

该方法总是可以编写相同的输出文件,因为它最适合它来自哪种方法。

使用上述概念而不是写入输出文件( PB2Out.txt ),因为正在创建 PalindromesSpotted ArrayList,我认为最好在你的文件之后编写文件ArrayList或ArrayLists已完成。要做到这一点,另一种方法( writeListToFile())最适合执行任务。要找出是否有任何相同的Pal与其他Acc匹配,再次使用另一种方法( getPalMatches())执行该任务也是个好主意。

由于在任何给定的 Seq 多个 Pal 的索引位置未正确报告,我还提供了另一种方法( findSubstringIndexes())快速处理该任务。

应该注意的是,下面的代码假定从 trimHeader()方法获取的 Seq 是一个单独的字符串,其中没有换行符。< / p>

重新编写的 PB2Scan()方法和其他上述方法如下:

  

PB2Scan()方法:

public static ArrayList<String> PB2Scan(String filePath, ArrayList<String> Pal, int resultType) 
                                throws FileNotFoundException, IOException {
    // Make sure the supplied result type is either 
    // 0, 1, or 2. If not then default to 0.
    if (resultType < 0 || resultType > 2) {
        resultType = 0;
    }
    ArrayList<String> PalindromesSpotted = new ArrayList<>();

    File file = new File(filePath);
    Scanner sc = new Scanner(file);
    sc.useDelimiter(">");
    //initializes the ArrayList
    ArrayList<String> Gene1 = new ArrayList<>();
    //Loads the Array List
    while (sc.hasNext()) {
        Gene1.add(sc.next());
    }
    sc.close(); // Close the read in text file.

    for (int i = 0; i < Gene1.size(); i++) {
        //Acc breaks down the title so the element:
        //>AX225014 Equine influenza virus H3N8 // 1 (PB2)
        //ATGAAGACAACCATTATTTTGATACTACTGACCCATTGGGTCTACAGTCAAAACCCAACCAGTGGCAACA
        //GGCATGTCCGCAAACGATTTGCAGACCAAGAACTGGGTGATGCCCCATTCCTTGACCGGCTTCGCCGAGA
        //comes out as AX225014
        String Acc = Accession(Gene1.get(i));

        //seq takes the same element as above and returns only
        //ATGAAGACAACCATTATTTTGATACTACTGACCCATTGGGTCTACAGTCAAAACCCAACCAGTGGCAACA
        //GGCATGTCCGCAAACGATTTGCAGACCAAGAACTGGGTGATGCCCCATTCCTTGACCGGCTTCGCCGAGA
        String seq = trimHeader(Gene1.get(i));
        for (int x = 0; x < Pal.size(); x++) {
            if (seq.contains(Pal.get(x))) {
                String match = Pal.get(x) + " in organism: " + Acc + 
                                " Was found in position(s): " + 
                                findSubstringIndexes(seq, Pal.get(x));
                PalindromesSpotted.add(match);
            }
        }
    }

    // If there is nothing to work with get outta here.
    if (PalindromesSpotted.isEmpty()) {
        return PalindromesSpotted;
    }

    // Sort the ArrayList
    Collections.sort(PalindromesSpotted);
    // Another ArrayList for matching Pal's to Acc's
    ArrayList<String> accMatchingPal = new ArrayList<>();
    switch (resultType) {
        case 0: // if resultType is 0 is supplied
            writeListToFile("PB2Out.txt", PalindromesSpotted);
            return PalindromesSpotted;

        case 1: // if resultType is 1 is supplied
            accMatchingPal = getPalMatches(PalindromesSpotted);
            writeListToFile("PB2Out.txt", accMatchingPal);
            return accMatchingPal;

        default: // if resultType is 2 is supplied
            accMatchingPal = getPalMatches(PalindromesSpotted);
            ArrayList<String> fullList = new ArrayList<>();
            fullList.addAll(PalindromesSpotted);
            // Create a Underline made of = signs in the list.
            fullList.add(String.join("", Collections.nCopies(70, "=")));
            fullList.addAll(accMatchingPal);
            writeListToFile("PB2Out.txt", fullList);
            return fullList;
    }
}   
  

findSubstringIndexes()方法:

private static String findSubstringIndexes(String inputString, String stringToFind){
    String indexes = "";
    int index = inputString.indexOf(stringToFind);
    while (index >= 0){
        indexes+= (indexes.equals("")) ? String.valueOf(index) : ", " + String.valueOf(index);
        index = inputString.indexOf(stringToFind, index + stringToFind.length())   ;
    }
    return indexes;
}
  

getPalMatches()方法:

private static ArrayList<String> getPalMatches(ArrayList<String> Palindromes) {
    ArrayList<String> accMatching = new ArrayList<>();
    for (int i = 0; i < Palindromes.size(); i++) {
        String matches = "";
        String[] split1 = Palindromes.get(i).split("\\s+");
        String pal1 = split1[0];
        // Make sure the current Pal hasn't already been listed.
        boolean alreadyListed = false;
        for (int there = 0; there < accMatching.size(); there++) {
            String[] th = accMatching.get(there).split("\\s+");
            if (th[0].equals(pal1)) {
                alreadyListed = true;
                break;
            }
        }
        if (alreadyListed) { continue; }
        for (int j = 0; j < Palindromes.size(); j++) {
            String[] split2 = Palindromes.get(j).split("\\s+");
            String pal2 = split2[0];
            if (pal1.equals(pal2)) {
                // Using Ternary Operator to build the matches string
                matches+= (matches.equals("")) ? pal1 + " was found in the following Accessions: "
                        + split2[3] : ", " + split2[3];
            }
        }
        if (!matches.equals("")) {
            accMatching.add(matches);
        }
    }
    return accMatching;
}
  

writeListToFile()方法:

private static void writeListToFile(String filePath, ArrayList<String> list, boolean... appendToFile) {
    boolean appendFile = false;
    if (appendToFile.length > 0) { appendFile = appendToFile[0]; }

    try {
        try (BufferedWriter bw = new BufferedWriter(new FileWriter(filePath, appendFile))) {
            for (int i = 0; i < list.size(); i++) {
                bw.append(list.get(i) + System.lineSeparator());
            }
        }
    } catch (IOException ex) {
        ex.printStackTrace();
    }
}