我用Java编码,我需要将从.txt获取的文本分成数组的不同部分。该文本由不同的文本和#34;组成。像一个文件集合。 每个文本之前的行是这样的:" * TEXT"和一些数字,但我认为,只有#34; * TEXT"它可以分为每个文本。 如何是.txt:
的一个例子*TEXT 017 01/04/63 PAGE 020
THE ALLIES AFTER NASSAU IN DECEMBER 1960, THE U.S ........
*TEXT 020 01/04/63 PAGE 021
THE ROAD TO JAIL IS PAVED WITH NONOBJECTIVE ART SINCE THE KREMLIN'S SHARPEST BARBS THESE DAYS ARE AIMED AT MODERN ART AND WESTERN ESPIONAGE...
*TEXT 025 01/04/63 PAGE 024
RED CHINA FIXING FRONTIERS RED CHINA PRODUCED A SECOND SURPRISE LAST WEEK...
所以我需要文本017在数组的位置,在下一个位置将是文本020。 我怎么能这样做?
这是我如何使用FileReader从.txt获取文本的代码:
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import javax.swing.JFileChooser;
public class Reader{
public static void main(String args[]){
File inFile;
FileReader fr;
BufferedReader bufReader;
JFileChooser chooser;
int reply;
String doc = "";
String line;
try{
chooser = new JFileChooser();
reply = chooser.showOpenDialog(null);
doc = chooser.getCurrentDirectory().getPath() + System.getProperty("file.separator") +
chooser.getSelectedFile().getName();
inFile = new File(doc);
fr = new FileReader(inFile);
bufReader = new BufferedReader (fr);
do{
line = bufReader.readLine();
if(line ==null )
return;
else{
System.out.println(line);
}
} while(line!=null);
bufReader.close();
}//end try
catch(Exception e)
{ System.out.println("error: "+e.getMessage()); }
}//main
}//end class reader
答案 0 :(得分:2)
您可以将整个文件读入String,然后使用String.split(String regex)
答案 1 :(得分:0)
您可以使用FileUtils来阅读该文件,然后您可以将其拆分,就像这样
public static void main(String[] args) throws IOException {
for (String s:FileUtils.readFileToString(new File("/home/leoks/file.txt")).split("\n")){
if (s.startsWith("*TEXT")) {
System.out.println(s.split(" ")[1]);
}
}
}
或者您可以使用类似的东西编写解析器
http://txt2re.com/index-java.php3?s=*TEXT%20017&-14&-1
答案 2 :(得分:0)
对不起,伙计们,无视我的回答。我键入了它,所以我离开了它,但我认为他只是想要在" * TEXT"之后的文本编号。标识符
尝试正则表达式和捕获。
String text = "this will be your document text"
Pattern p = Pattern.compile("(.*TEXT ([0-9]{3}))+.*");
Matcher m = p.matcher(line);
int numCounts = m.groupCount();
String texts[] = new String[numCounts];
for (int i = 1; i <= numCounts; i++) {
// group(0) is whole match you want each group a 1
texts[i-1] = m.group(i);
}
// now they should be in your texts
或者你可以这样做:
String text = "this will be your document text"
Pattern p = Pattern.compile("TEXT ([0-9]{3})");
Matcher m = p.matcher(line);
ArrayList<String> list = new ArrayList<String>();
while (m.find()) {
list.add(m.group(1));
}
String texts[] = list.toArray();
// now they should be in your texts
答案 3 :(得分:0)
试试这种方式
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(
new FileInputStream("yourTextFile")));
StringBuilder br = new StringBuilder();
String newLine ="";
while(true){
String line = bufferedReader.readLine();
if(line == null)
break;
br.append(line);
}
newLine = br.toString();
String arr[] = newLine.split("\\*TEXT");
System.out.println(java.util.Arrays.toString(arr));