读取文件并在括号之间标识字符串

时间:2015-04-13 15:44:39

标签: java string file

我试图命名将在最后一组括号" genesym"之间出现的字符串。到目前为止,我正在使用制表符交换每个括号的最后一个匹配项。在这些函数之间,我想命名现有的String geneym。

我意识到这是一个扫描仪功能,但这是我知道怎么说的唯一方法。

import java.lang.*;
import java.io.*;

public class TESTING
{
    public static void main(String[] args)
    {
        try {
            BufferedReader br = new BufferedReader(new FileReader("human.rna.fna"));
            BufferedWriter bw = new BufferedWriter(new FileWriter("FormattedHumanRNA"));

            String line
            String genesym;

            while ((line = br.readLine()) != null) {
                if (line.startsWith(">")) {
                    // Replaces the last set of parenthesis with a tab character
                    int openbracket =  line.lastIndexOf("(");
                    line = new StringBuilder(line)
                        .replace(openbracket, openbracket + 1, "\t")
                        .toString();


                    **genesym = br.nextString();**


                    // Replaces the last close parenthesis with a tab character
                    int closebracket = line.lastIndexOf(")");
                    line = new StringBuilder(line)
                        .replace(closebracket, closebracket + 1, "\t")
                        .toString();
                } else {
                    line = line.replaceAll ("\n", "");
                }

                bw.write(genesym + " : " + line);
            }
            br.close();
            bw.close();
        } catch(IOException e) {
            e.printStackTrace(System.err);
        }
    }
}

示例:(我的数据比这大,大约100万行)

输入文件:

>365 (LOC1), long non-coding RNA AGCGTCT

>22 (1*split3**) (FLJ), long RNA AAAATC

>13 (RTV), RNA ATGCG

期望的输出:

LOC1 : >365      LOC1     , long non-coding RNA AGCGTCT

FLJ : >22 (1*split3**)      FLJ     , long RNA  AAAATC

RTV : >13     RTV     ,RNA ATGCG

1 个答案:

答案 0 :(得分:0)

String genesym=line.substring(openbracket,closebracket);

然后替换您想要替换的内容。