我希望通过解析输入文件来获取多个文件。 输入文件包含许多数千种蛋白质序列的fasta格式,我想生成原始格式(即,没有任何逗号分号,也没有任何额外的符号,如“>”,“[”,“]”等)每个蛋白质序列。
fasta序列从“>”开始符号后面是蛋白质的描述,然后是蛋白质的序列。
例如►> lcl | NC_000001.10_cdsid_XP_003403591.1 [gene = LOC100652771] [蛋白质=假设蛋白质LOC100652771] [protein_id = XP_003403591.1] [location = join(12190..12227,12595..12721,13403..13639)] MSESINFSHNLGQLLSPPRCVVMPGMPFPSIRSPELQKTTADLDHTLVSVPSVAESLHHPEITFLTAFCL PSFTRSRPLPDRQLHHCLALCPSFALPAGDGVCHGPGLQGSCYKGETQESVESRVLPGPRHRH
与上述甲酸盐一样,输入文件包含1000个蛋白质序列。我必须生成数千个仅包含单个蛋白质序列的原始文件,没有任何特殊符号或间隙。
我已经用Java开发了它的代码,但是输出的是:无法打开文件后跟无法找到的文件。
请帮我解决问题。
此致 Vijay Kumar Garg 瓦拉纳西 巴拉特(印度)
代码是
/*Java code to convert FASTA format to a raw format*/
import java.io.*;
import java.util.*;
import java.util.regex.*;
import java.io.FileInputStream;
// java package for using regular expression
public class Arrayren
{
public static void main(String args[]) throws IOException
{
String a[]=new String[1000];
String b[][] =new String[1000][1000];
/*open the id file*/
try
{
File f = new File ("input.txt");
//opening the text document containing genbank ids
FileInputStream fis = new FileInputStream("input.txt");
//Reading the file contents through inputstream
BufferedInputStream bis = new BufferedInputStream(fis);
// Writing the contents to a buffered stream
DataInputStream dis = new DataInputStream(bis);
//Method for reading Java Standard data types
String inputline;
String line;
String separator = System.getProperty("line.separator");
// reads a line till next line operator is found
int i=0;
while ((inputline=dis.readLine()) != null)
{
i++;
a[i]=inputline;
a[i]=a[i].replaceAll(separator,"");
//replaces unwanted patterns like /n with space
a[i]=a[i].trim();
// trims out if any space is available
a[i]=a[i]+".txt";
//takes the file name into an array
try
// to handle run time error
/*take the sequence in to an array*/
{
BufferedReader in = new BufferedReader (new FileReader(a[i]));
String inline = null;
int j=0;
while((inline=in.readLine()) != null)
{
j++;
b[i][j]=inline;
Pattern q=Pattern.compile(">");
//Compiling the regular expression
Matcher n=q.matcher(inline);
//creates the matcher for the above pattern
if(n.find())
{
/*appending the comment line*/
b[i][j]=b[i][j].replaceAll(">gi","");
//identify the pattern and replace it with a space
b[i][j]=b[i][j].replaceAll("[a-zA-Z]","");
b[i][j]=b[i][j].replaceAll("|","");
b[i][j]=b[i][j].replaceAll("\\d{1,15}","");
b[i][j]=b[i][j].replaceAll(".","");
b[i][j]=b[i][j].replaceAll("_","");
b[i][j]=b[i][j].replaceAll("\\(","");
b[i][j]=b[i][j].replaceAll("\\)","");
}
/*printing the sequence in to a text file*/
b[i][j]=b[i][j].replaceAll(separator,"");
b[i][j]=b[i][j].trim();
// trims out if any space is available
File create = new File(inputline+"R.txt");
try
{
if(!create.exists())
{
create.createNewFile();
// creates a new file
}
else
{
System.out.println("file already exists");
}
}
catch(IOException e)
// to catch the exception and print the error if cannot open a file
{
System.err.println("cannot create a file");
}
BufferedWriter outt = new BufferedWriter(new FileWriter(inputline+"R.txt", true));
outt.write(b[i][j]);
// printing the contents to a text file
outt.close();
// closing the text file
System.out.println(b[i][j]);
}
}
catch(Exception e)
{
System.out.println("cannot open a file");
}
}
}
catch(Exception ex)
// catch the exception and prints the error if cannot find file
{
System.out.println("cannot find file ");
}
}
}
如果您提供正确的信息,那将更容易理解。
答案 0 :(得分:0)
由于缺少java expertice,此代码不会赢得价格。例如,即使它是正确的,我也会期待OutOfMemory。 最好是重写。 然而,我们都从小开始。
for (int i = 0; i < a.length; ++i)
。if (s.contains(">")
。
。一个人不需要创建新文件。代码:
const String encoding = "Windows-1252"; // Or "UTF-8" or leave away.
File f = new File("C:/input.txt");
BufferedReader dis = new BufferedReader(new InputStreamReader(
new FileInputStream(f), encoding));
...
int i= -1; // So i++ starts with 0.
while ((inputline=dis.readLine()) != null)
{
i++;
a[i]=inputline.trim();
//replaces unwanted patterns like /n with space
// Not needed a[i]=a[i].replaceAll(separator,"");
答案 1 :(得分:0)
您的代码包含以下两个catch
块:
catch(Exception e)
{
System.out.println("cannot open a file");
}
catch(Exception ex)
// catch the exception and prints the error if cannot find file
{
System.out.println("cannot find file ");
}
这两个都吞下了异常并打印了一个通用的“它不起作用”的消息,它告诉你输入了catch
块,但仅此而已。
例外通常包含有用的信息,可以帮助您找出真正问题的位置。忽略它们,你会更难以诊断你的问题。更糟糕的是,你正在捕捉Exception
,它是许多异常的超类,因此这些catch
块会捕获许多不同类型的异常并忽略它们。
从异常中获取信息的最简单方法是调用其printStackTrace()
方法,该方法打印异常类型,异常消息和堆栈跟踪。在这两个catch
块中添加对此的调用,这将有助于您更清楚地了解抛出的异常以及从何处抛出异常。